Abstract: The talk will be divided in two parts:
Steering Distortions to Preserve Classes and Neighbors in Supervised Dimensionality Reduction
Nonlinear dimensionality reduction of high-dimensional data is challenging as the low-dimensional embedding will necessarily contain distortions, and it can be hard to determine which distortions are the most important to avoid. When annotation of data into known relevant classes is available, it can be used to guide the embedding to avoid distortions that worsen class separation. We introduce a supervised mapping method called ClassNeRV, with an original stress function that takes class annotation into account and evaluates embedding quality both in terms of false neighbors and missed neighbors. ClassNeRV shares the theoretical framework of a family of methods descended from Stochastic Neighbor Embedding (SNE). Our approach has a key advantage over previous ones: in the literature supervised methods often emphasize class separation at the price of distorting the data neighbors' structure; conversely, unsupervised methods provide better preservation of structure at the price of often mixing classes. Experiments show that ClassNeRV can preserve both neighbor structure and class separation, outperforming nine state of the art alternatives.
Probabilistic Dynamic Non-negative Group Factor Model for Multi-source Text Mining
Nonnegative matrix factorization (NMF) is a popular approach to model data, however, most models are unable to flexibly take into account multiple matrices across sources and time or apply only to integer-valued data. We introduce a probabilistic, Gaussian Process-based, more inclusive NMF-based model which jointly analyzes nonnegative data such as text data word content from multiple sources in a temporal dynamic manner. The model collectively models observed matrix data, source-wise latent variables, and their dependencies and temporal evolution with a full-fledged hierarchical approach including flexible nonparametric temporal dynamics. Experiments on simulated data and real data show the model out-performs, comparable models. A case study on social media and news demonstrates the model discovers semantically meaningful topical factors and their evolution.
Speakers: Jaakko Peltonen
Jaakko Peltonen is a professor of statistics and data analysis at Tampere University, Faculty of Information Technology and Communication Sciences, where he leads a research group on Statistical Machine Learning and Exploratory Data Analysis (SMiLE). His research interests include probabilistic generative and information-theoretic methods especially in dimensionality reduction, visualization, exploratory information retrieval, and modeling of text data.
Affiliation: Tampere University
Place of Seminar: Zoom (Available afterwards on Youtube)
Meeting ID: 694 7795 7496
Passcode: 047676