Abstract: In many real-world settings with large datasets, the data will be incompletely labelled. In cases were positive instances receive much more attention then negatives, one can assume that the label noise is dominated by missing labels. This is typical in extreme multilabel classification problems, where the number of labels is very large. These problems are usually solved by reducing them into a series of multiclass or multilabel problems that are easier to solve. In this talk, I am going to present how these reductions interact with the missing labels problem. As a first step, an unbiased estimate of the loss function (given know noise rates) will be derived. This estimator may suffer from large variance, thus an alternative based on convex upper-bounds will also be presented.
Speakers: Erik Schultheis
Affiliation: Aalto University
Place of Seminar: Zoom