Tapio Pahikkala: Small Data AUC Estimation of Machine Learning Methods - Pitfalls and Remedies

Name: Tapio Pahikkala: Small Data AUC Estimation of Machine Learning Methods - Pitfalls and Remedies — FCAI
Start: 2017-03-13T09:00:00+0200
End: 2017-03-13T10:00:00+0200
Location: University of Helsinki

Monday, March 13, 2017
9:00 AM 10:00 AM 09:00 10:00

University of Helsinki Pietari Kalmin katu 5 Exactum, lh D122 Finland (map)

Google Calendar ICS

Abstract: Asking whether two populations can be distinguished from each other is one of the most fundamental questions in data analysis and area under ROC curve (AUC) is one of the simplest and most practical tools for answering it. Also known as the Wilcoxon-Mann-Whitney U statistic, it can be associated with a p-value indicating how likely one would obtain as good AUC value if the two populations would not be stochastically different. Estimating AUC of a predictive model and its statistical significance has a huge practical importance in fields like medicine, where one often has access to only small amounts of labeled data but large number of features. Leave-pair-out cross-validation (LPOCV) is an almost unbiased AUC estimator of machine learning methods that has also been empirically shown to be the most reliable of the cross-validation (CV) based estimators. We further study the properties of LPOCV and show some serious pitfalls one can encounter when estimating AUC with CV and how to avoid them. In particular, we show how one can produce very promising results with high AUC values even if there is no signal in the data. Finally, we show how to counter these risks with new Wilcoxon–Mann–Whitney U type of permutation tests adjusted for LPOCV, thus upgrading one of the classical statistical tools for CV estimates.

Speaker: Tapio Pahikkala

Affiliation: Assistant Professor, University of Turku

Place of Seminar: University of Helsinki

Posted in Spring 2017
Tagged Machine Learning Coffee Seminar