Abstract: Approximate nearest neighbor (ANN) search is a classic algorithmic problem. Unsupervised space partitioning trees, such as k-d, random projection (RP), and principal component (PCA) trees, have been traditionally used as index structures for ANN search. However, the nearest neighbors candidates are still retrieved by algorithmic techniques, such as backtracking search. In this talk, we show how the candidate set selection for ANN search can be directly formulated as a multilabel classification problem. Under this formulation, space partitioning trees are interpreted as multilabel classifiers. In addition to improving performance of the earlier unsupervised trees, our formulation enables fitting any classifier learned in a supervised fashion, such as a random forest, directly to this multilabel classification task, and thus using it as an index structure for ANN search. This opens a promising research direction into an established algorithmic problem.
The presentation is based on a joint work with Elias Jääsaari (Carnegie Mellon University) and professor Teemu Roos (University of Helsinki) that will appear in Proceedings of Neural Information Processing Systems (NeurIPS) 2022.
Speakers: Ville Hyvönen received his M.Sc. decree in Statistics from University of Helsinki. He is currently a PhD candidate at the CS department of University of Helsinki under supervision of professor Teemu Roos. He is working on the foundations of tree-based supervised learning methods and their applications (the current application being ANN search).
Affiliation: University of Helsinki
Place of Seminar: Kumpula exactum C323 (in person) & Otaniemi, T5 (streaming)