Abstract: We study the combinatorics of cross-validation based AUC estimation under the null hypothesis that the binary class labels are exchangeable, that is, the data are randomly assigned into two classes. In particular, we study how the estimators based on leave-pair-out cross-validation (LPOCV), in which every possible pair of data with different class labels is held out from the training set at a time, behave under the null without any prior assumptions of the learning algorithm or the data. It is observed that the maximal number of different assignments of w nonzero and n-w zero class labels on the data, for which any fixed learning algorithm can achieve zero LPOCV error, is equivalent with the maximal size of a constant weight error-correcting code of length n, weight w and Hamming distance four between code words. We then introduce the concept of a light constant weight code and show similar results for bounded LPOCV errors. These results enable the design of new LPOCV based statistical tests for the learning algorithms ability to distinguish two classes from each other that are analogous to the classical Wilcoxon-Mann-Whitney U test for fixed functions.
Speaker: Professor Tapio Pahikkala
Affiliation: Department of Future Technologies, University of Turku
Place of Seminar: Zoom