Back to All Events

Zhirong Yang: Learning Data Representation by Large-Scale Neighbor Embedding

  • Aalto University Konemiehentie 2 CS building, lh T6 Finland (map)

Abstract: Machine learning, the state-of-the-art data science, has been increasingly influencing our life. Encoding data in a suitable vector space is the fundamental starting point for machine learning. A good vector coding should respect the relations among the data items. However, conventional methods that preserve pairwise or higher order relationship are very slow and consequently they can handle only small-scale data sets. We have been developing a family of unsupervised methods called large-scale Neighbor Embedding (NE) which substantially accelerate the vector coding. Our method can thus learn low-dimensional vector representation for mega-scale data according to their neighborhoods in the original space. With our efficient algorithms and a wealth of neighborhood information, Neighbor Embedding significantly outperforms small-scale NE and many other existing approaches for learning data representation. Besides generic feature extraction, our work also delivers two important tools as special cases of Neighbor Embedding for data visualization and cluster analysis, which scales up these applications by an order of magnitude and enables the current-sized visualization and clustering for interactive use. Because neighborhood information is naturally and massively available in many areas, our method has wide applications as a critical component in scientific research, next-generation DNA sequence analysis, natural language processing, educational cloud, financial data analysis, market studies, etc.

Speaker: Zhirong Yang

Affiliation: Department of Computer Science, Aalto University

Place of Seminar: Aalto University