Abstract: Topic models, stemming from Latent Dirichlet Allocation (LDA), have
typically been formulated as generative models of discrete words.
Various attempts have been made in recent years to extend this type of
model to benefit from the information in unsupervised representations of
word (embeddings), most often learned from a neural network with a
language-modelling objective. These include Gaussian LDA, the Embedded
Topic Model and the Latent Concept Topic Model. I present some of these
methods and some ideas and ongoing work on how to draw them together and
improve them. A particular goal is to produce topic models qualitatively
similar to LDA, which has been widely used for applications in the
social sciences and Digital Humanities.
Speaker: Mark Granroth-Wilding
Affiliation: Department of Computer Science, Helsinki University
Place of Seminar: Zoom (Available now on Youtube)