Abstract: Despite impressive advances, contemporary neural networks still provide spotty and pointilistic estimates which are hard to trust. This talk introduces a new family of non-linear neural network activation functions that mimic the properties induced by the widely-used Matérn family of kernels in Gaussian process (GP) models. This class spans a range of locally stationary models of various degrees of mean-square differentiability. This work shows an explicit link to the corresponding GP models in the case that the network consists of one infinitely wide hidden layer. Matérn activation functions result in similar appealing properties to their counterparts in GP models, and we demonstrate that the local stationarity property together with limited mean-square differentiability shows both good performance and uncertainty calibration in Bayesian deep learning tasks. In particular, local stationarity helps calibrate out-of-distribution (OOD) uncertainty. The examples demonstrate these properties on classification and regression benchmarks and a radar emitter classification task.
This is joint work with my students Lassi Meronen and Christabella Irwanto, and the work is to be presented at NeurIPS later this year. An arXiv pre-print is available at https://arxiv.org/abs/2010.09494.
Speakers: Arno Solin
Affiliation: Computer Science Department, Aalto University
Place of Seminar: Zoom (Now available on Youtube)