Scientific Presentations

AI Day 2021

November 4, 2021

TALKS

Scientific track 1: Onsite talks in Dipoli
Session 1: Natural Language Processing (9:45–11:00)
Session 2: Machine Learning (11:15–12:15)
Session 3: Constraint Optimization and Search (13:00–14:00)
Session 4: Multidisciplinary Applications (14:15–15:15)
Session 5: Machine Learning (15:30–16:30)
Session 6: Human Aspects, Interactions and Applications (16:45–18:00)

Scientific track 2: Online talks via Zoom
Session 7 (15:15–16:30)
Session 8 (16:45–18:00)

POSTERS

Onsite in Dipoli (18:00–19:00)
Online via Zoom / Teams (18:00–19:00)

You can find the list of talks and posters below. Please click the talk/poster titles to see the abstract.

Links to online talks and posters are posted on the event platform. Onsite talks are also be streamed online. Register to the event to get access to the platform.

Onsite talks

Session 1: Natural Language Processing (9:45-11:00)

Session chair: Anssi Yli-Jyrä

9:45 - Self-Supervised End-to-End ASR for Low Resource L2 Swedish - Ragheb Al-Ghezi (Aalto University), Yaroslav Getman (Aalto University), Mikko Kurimo (Aalto University) [click for abstract]

Unlike traditional (hybrid) Automatic Speech Recognition (ASR), end-to-end ASR systems simplify the training procedure by directly mapping acoustic features to sequences of graphemes or characters, thereby eliminating the need for specialized acoustic, language, or pronunciation models. However, one drawback of end-to-end ASR systems is that they require more training data than conventional ASR systems to achieve similar word error rate (WER). This makes it difficult to develop ASR systems for tasks where transcribed target data is limited such as developing ASR for Second Language (L2) speakers of Swedish. Nonetheless, recent advancements in self-supervised acoustic learning, manifested in wav2vec models leverage the available untranscribed speech data to provide compact acoustic representation that can achieve low WER when incorporated in end-to-end systems. To this end, we experiment with several monolingual and cross-lingual self-supervised acoustic models to develop end-to-end ASR system for L2 Swedish. Even though our test is very small, it indicates that these systems are competitive in performance with traditional ASR pipeline. Our best model seems to reduce the WER by 7% relative to our traditional ASR baseline trained on the same target data.

10:00 - Speaker Verification Experiments for Adults and Children using a shared embedding space - Tuomas Kaseva (Aalto University), Hemant Kathania (Aalto University), Aku Rouhe (Aalto University), Mikko Kurimo (Aalto University) [click for abstract]

In this work, we present our efforts towards developing a robust speaker verification system for children when the data is limited. We propose a novel deep learning -based speaker verification system that combines long-short term memory cells with NetVLAD and additive margin softmax loss. First we investigated these methods on a large corpus of adult data and then applied the best configuration for child speaker verification. For children, the system trained on a large corpus of adult speakers performed worse than a system trained on a much smaller corpus of children’s speech. This is due to the acoustic mismatch between training and testing data. To capture more acoustic variability we trained a shared system with mixed data from adults and children. The shared system yields the best EER for children with no degradation for adults. Thus, the single system trained with mixed data is applicable for speaker verification for both adults and children.

10:15 - Convex Surrogates for Unbiased Loss Functions in Extreme Classification With Missing Labels – Mohammadreza Qaraei (Aalto University), Erik Schultheis (Aalto University), Priyanshu Gupta (IIT, Kanpur), Rohit Babbar (Aalto University) [click for abstract]

Extreme Classification (XC) refers to supervised learning where each training/test instance is labeled with small subset of relevant labels that are chosen from a large set of possible target labels. The framework of XC has been widely employed in web applications such as automatic labeling of web-encyclopedia, prediction of related searches, and recommendation systems. While most state-of-the-art models in XC achieve high overall accuracy by performing well on the frequently occurring labels, they perform poorly on a large number of infrequent (tail) labels. This arises from two statistical challenges, (i) missing labels, as it is virtually impossible to manually assign every relevant label to an instance, and (ii) highly imbalanced data distribution where a large fraction of labels are tail labels. In this work, we consider common loss functions that decompose over labels, and calculate unbiased estimates that compensate missing labels according to (Natarajan et al., 2017). This turns out to be disadvantageous from an optimization perspective, as important properties such as convexity and lower-boundedness are lost. To circumvent this problem, we use the fact that typical loss functions in XC are convex surrogates of the 0-1 loss, and thus propose to switch to convex surrogates of its unbiased version. These surrogates are further adapted to the label imbalance by combining with label-frequency-based rebalancing. We show that the proposed loss functions can be easily incorporated into various different frameworks for extreme classification. This includes (i) linear classifiers, such as DiSMEC, on sparse input data representation, (ii) attention-based deep architecture, AttentionXML, learnt on dense Glove embeddings, and (iii) XLNet-based transformer model for extreme classification, APLC-XLNet. Our results demonstrate consistent improvements over the respective vanilla baseline models, on the propensity-scored metrics for precision and nDCG.

10:30 - Dialog Modelling Experiments with Finnish One-to-One Chat Data – Lili Aunimo (Haaga-Helia University of Applied Sciences), Janne Kauttonen (Haaga-Helia University of Applied Sciences) [click for abstract]

We analyzed two conversational corpora in Finnish: A public library question-answering (QA) data and a private medical chat data. We developed response retrieval (ranking) models using TF-IDF, StarSpace, ESIM and BERT methods. These four represent techniques ranging from the simple and classical ones to recent pretrained transformer neural networks. We evaluated the effect of different preprocessing strategies, including raw, casing, lemmatization and spell-checking for the different methods. Using our medical chat data, we also developed a novel three-stage preprocessing pipeline with speaker role classification. We found the BERT model pretrained with Finnish (FinBERT) an unambiguous winner in ranking accuracy, reaching 92.2% for the medical chat and 98.7% for the library QA in the 1-out-of-10 response ranking task where the chance level was 10%. The best accuracies were reached using uncased text with spell-checking (BERT models) or lemmatization (non-BERT models). The role of preprocessing had less impact for BERT models compared to the classical and other neural network models. Furthermore, we found the TF-IDF method still a strong baseline for the vocabulary-rich library QA task, even surpassing the more advanced StarSpace method. Our results highlight the complex interplay between preprocessing strategies and model type when choosing the optimal approach in chat-data modelling. Our study is the first work on dialogue modelling using neural networks for the Finnish language. It is also first of the kind to use real medical chat data. Our work contributes towards the development of automated chatbots in the professional domain.

10:45 - Inferring Case-Based Reasoners’ Knowledge to Enhance Interactivity – Pierre-Alexandre Murena (Aalto University), Marie Al-Ghossein (University of Helsinki) [click for abstract]

When interacting with a human user, an artificial intelligence needs to have a clear model of the human’s behaviour to make the correct decisions, be it recommending items, helping the user in a task or teaching a language. This is in particular the case for intelligent tutoring systems which must maintain a good understanding of what the user knows. In practice, this raises two questions: what did the user memorize and how does the user reuse this knowledge? These two questions are at the core of the domain of Case-Based Reasoning (CBR). In this paper, we explore the feasibility of modelling the human as a case-based reasoning agent through the question of how to infer the state of a CBR agent from interaction data. We identify the main parameters to be inferred, and propose a Bayesian belief update as a possible way to infer both the parameters of the agent and the content of their case base. We illustrate our ideas with the simple application of an agent learning Finnish grammar rules throughout a sequence of observations and show that the teacher can indeed predict what the user's knowledge and reasoning parameters.

Session 2: Machine Learning (11:15-12:15)

Session chair: Arto Klami

11:15 - Differentially Private Hamiltonian Monte Carlo – Ossi Räisä (University of Helsinki), Antti Koskela (University of Helsinki), Antti Honkela (University of Helsinki) [click for abstract]

Markov chain Monte Carlo (MCMC) algorithms have long been the main workhorses of Bayesian inference. Among them, Hamiltonian Monte Carlo (HMC) has recently become very popular due to its efficiency resulting from effective use of the gradients of the target distribution. In privacy-preserving machine learning, differential privacy (DP) has become the gold standard in ensuring that the privacy of data subjects is not violated. Existing DP MCMC algorithms either use random-walk proposals, or do not use the Metropolis-Hastings (MH) acceptance test to ensure convergence without decreasing their step size to zero. We present a DP variant of HMC using the MH acceptance test that builds on a recently proposed DP MCMC algorithm called the penalty algorithm, and adds noise to the gradient evaluations of HMC. We prove that the resulting algorithm converges to the correct distribution, and is ergodic. We compare DP-HMC with the existing penalty, DP-SGLD and DP-SGNHT algorithms, and find that DP-HMC has better or equal performance than the penalty algorithm, and performs more consistently than DP-SGLD or DP-SGNHT.

11:30 - d3p - A Python Package for Differentially-Private Probabilistic Programming – Lukas Prediger (Aalto University), Niki Loppi (NVIDIA), Samuel Kaski (Aalto University and University of Manchester), Antti Honkela (University of Helsinki) [click for abstract]

We present d3p, a software package designed to help fielding runtime efficient widely-applicable Bayesian inference under differential privacy guarantees. d3p achieves general applicability to a wide range of probabilistic modelling problems by implementing the differentially private variational inference algorithm, allowing users to fit any parametric probabilistic model with a differentiable density function. d3p adopts the probabilistic programming paradigm as a powerful way for the user to flexibly define such models. We demonstrate the use of our software on a hierarchical logistic regression example, showing the expressiveness of the modelling approach as well as the ease of running the parameter inference. We also perform an empirical evaluation of the runtime of the private inference on a complex model and find a ~10 fold speed-up compared to an implementation using TensorFlow Privacy.

11:45 - Behaviour conditioned Policies for Reinforcement Learning Tasks – Antti Keurulainen (Aalto University and Bitville Oy), Isak Westerlund (Bitville Oy), Ariel Kwiatkowski (Bitville Oy), Samuel Kaski (Aalto University and University of Manchester), Alexander Ilin (Aalto University) [click for abstract]

The cooperation among AI systems, and between AI systems and humans is becoming increasingly important. In various real-world tasks, an agent needs to cooperate with unknown partner agent types. This requires the agent to assess the behaviour of the partner agent during a cooperative task and to adjust its own policy to support the cooperation. Deep reinforcement learning models can be trained to deliver the required functionality but are known to suffer from sample inefficiency and slow learning. However, adapting to a partner agent behaviour during the ongoing task requires ability to assess the partner agent type quickly. We suggest a method, where we synthetically produce populations of agents with different behavioural patterns together with ground truth data of their behaviour, and use this data for training a meta-learner. We additionally suggest an agent architecture, which can efficiently use the generated data and gain the meta-learning capability. When an agent is equipped with such a meta-learner, it is capable of quickly adapting to cooperation with unknown partner agent types in new situations. This method can be used to automatically form a task distribution for meta-training from emerging behaviours that arise, for example, through self-play.

12:00 - Automating Privilege Escalation with Deep Reinforcement Learning – Kalle Kujanpää (Aalto University), Willie Victor (F-Secure), Alexander Ilin (Aalto University) [click for abstract]

AI-based defensive solutions are necessary to defend against intelligent automated attacks but gathering enough realistic data for training machine learning-based defenses is a significant practical challenge. In this work, we present a reinforcement learning agent that can perform local privilege escalation in a Windows 7 environment using a wide variety of different techniques depending on the environment configuration it encounters. Hence, our agent is usable for generating realistic attack sensor data for training and evaluating defense systems.

SESSION 3: Constraint Optimization and Search (13:00-14:00)

Session chair: Jussi Rintanen

13:00 - Responsive and Personalized Web Layouts with Integer Programming – Markku Laine (Aalto University), Yu Zhang (Aalto University), Simo Santala (Aalto University), Jussi P. P. Jokinen (University of Helsinki), Antti Oulasvirta (Aalto University) [click for abstract]

Over the past decade, responsive web design (RWD) has become the de facto standard for adapting web pages to a wide range of devices used for browsing. While RWD has improved the usability of web pages, it is not without drawbacks and limitations: designers and developers must manually design the web layouts for multiple screen sizes and implement associated adaptation rules, and its “one responsive design fits all” approach lacks support for personalization. This paper presents a novel approach for automated generation of responsive and personalized web layouts. Given an existing web page design and preferences related to design objectives, our integer programming -based optimizer generates a consistent set of web designs. Where relevant data is available, these can be further automatically personalized for the user and browsing device. The paper includes presentation of techniques for runtime adaptation of the designs generated into a fully responsive grid layout for web browsing. Results from our ratings-based online studies with end users (N = 86) and designers (N = 64) show that the proposed approach can automatically create high-quality responsive web layouts for a variety of real-world websites.

13:15 - Enabling Incrementality in the Implicit Hitting Set Approach to MaxSAT under Changing Weights – Andreas Niskanen (University of Helsinki), Jeremias Berg (University of Helsinki), Matti Järvisalo (University of Helsinki) [click for abstract]

Recent advances in solvers for the Boolean satisfiability (SAT) based optimization paradigm of maximum satisfiability (MaxSAT) have turned MaxSAT into a viable approach to finding provably optimal solutions for various types of hard optimization problems. In various types of real-world problem settings, a sequence of related optimization problems need to solved. This calls for studying ways of enabling incremental computations in MaxSAT, with the hope of speeding up the overall computation times. However, current state-of-the-art MaxSAT solvers offer no or limited forms of incrementality. In this work, we study ways of enabling incremental computations in the context of the implicit hitting set (IHS) approach to MaxSAT solving, as both one of the key MaxSAT solving approaches today and a relatively well-suited candidate for extending to incremental computations. In particular, motivated by several recent applications of MaxSAT in the context of interpretability in machine learning calling for this type of incrementality, we focus on enabling incrementality in IHS under changes to the objective function coefficients (i.e., to the weights of soft clauses). To this end, we explain to what extent different search techniques applied in IHS-based MaxSAT solving can and cannot be adapted to this incremental setting. As practical result, we develop an incremental version of an IHS MaxSAT solver, and show it provides significant runtime improvements in recent application settings which can benefit from incrementality but in which MaxSAT solvers have so-far been applied only non-incrementally, i.e., by calling a MaxSAT solver from scratch after each change to the problem instance at hand.

13:30 - Diversity-aware k-median: clustering with fair center representation – Suhas Thejaswi (Aalto University), Bruno Ordozgoiti (Aalto University), Aristides Gionis (KTH Royal Institute of Technology) [click for abstract]

We introduce a novel problem for diversity-aware clustering. We assume that the potential cluster centers belong to a set of groups defined by protected attributes, such as ethnicity, gender, etc. We then ask to find a minimum-cost clustering of the data into $k$ clusters so that a specified minimum number of cluster centers are chosen from each group. We thus require that all groups are represented in the clustering solution as cluster centers, according to specified requirements.

We show that in the general case where the facility groups may overlap, the diversity-aware $k$- median problem is NP-hard, fixed-parameter intractable, and inapproximable to any multiplicative factor. On the other hand, when the facility groups are disjoint, approximation algorithms can be obtained by reduction to the matroid median and red-blue median problems. Experimentally, we evaluate our approximation methods for the tractable cases, and present a relaxation-based heuristic for the theoretically intractable case, which can provide high-quality and efficient solutions for real-world datasets.

13:45 - Approximating the Permanent with Deep Rejection Sampling – Juha Harviainen (University of Helsinki), Antti Röyskö (ETH Zürich), Mikko Koivisto (University of Helsinki) [click for abstract]

We present a randomized approximation scheme for the permanent of a matrix with nonnegative entries. Our scheme extends a recursive rejection sampling method of Huber and Law (SODA 2008) by replacing the permanent upper bound with a linear combination of the subproblem bounds at a moderately large depth of the recursion tree. This method, we call deep rejection sampling, is empirically shown to outperform the basic, depth-zero variant, as well as a related method by Kuck et al. (NeurIPS 2019). We analyze the expected running time of the scheme on random (0, 1)-matrices where each entry is independently 1 with probability p. Our bound is superior to a previous one for p less than 1/5, matching another bound that was only known to hold when every row and column has density exactly p.

SESSION 4: Multidisciplinary Applications (14:15-15:15)

Session chair: Tapio Pahikkala

14:15 - EYES-project case study: Selecting Feature Sets and Comparing Classification Methods for Cognitive State Estimation – Kati Pettersson (VTT), Jaakko Tervonen (VTT), Johanna Närväinen (VTT), Pentti Henttonen (UH), Ilmari Määttänen (UH) and Jani Mäntyjärvi (VTT) [click for abstract]

Acute stress and high workload are part of everyday work at safety critical fields. Adaptive human computer interaction systems could support and guide professionals in their hectic situations. Seamless HCI requires accurate cognitive state estimation of the person. The Academy-project EYES aims to explore and develop novel & seamless cognitive state estimation methods for real-time & real-life settings. The cognitive state estimation focuses on biosensor data combined with information from the eyes.
This study demonstrates a classification of different types of cognitive states by using feature combinations from the eyes (measured with electro-oculography, EOG) and heart (measured with electrocardiography, ECG) in general and personalized approaches, comparing three different classifiers. The classification is evaluated for features extracted from both signals separately and together, and the most important features are selected and reported. Results indicate that the best performance is achieved when features from both EOG and ECG signals are used, and approximately twenty features from EOG and ECG signals are enough to distinguish the two/three states. A personalized approach together with feature selection and support vector machine classifier achieves accuracies of 96.9% and 86.3% in classifying between two states (relaxation and stress) and three states (relaxation, psycho-social stress, and physiological stress), respectively, which exceed state-of-the-art performance. Thus cognitive state estimation benefits from combining selected eye and heart parameters, which suggests a promising basis for real-time estimation in the future.

K. Pettersson, J. Tervonen, J. Närväinen, P. Henttonen, I. Määttänen and J. Mäntyjärvi, ""Selecting Feature Sets and Comparing Classification Methods for Cognitive State Estimation,"" 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE), 2020, pp. 683-690, doi: 10.1109/BIBE50027.2020.00115.

14:30 - Self-Swarming for Multi-Robot Systems Deployed for Situational Awareness – Fabrice Saffre (VTT), Hanno Hildmann (TNO), Hannu Karvonen (VTT), Timo Lind (VTT) [click for abstract]

Machine-based situational awareness is a key element to conscious and intelligent interaction with the complex world we live in, be it for the individual unit, a complex dynamical system, or even complex systems of systems. To create this awareness, the frequent gathering of accurate and real-time intelligence data is required to ensure timely, accurate, and actionable information. Unmanned aerial vehicles (UAVs) and other semi-autonomous cyber-physical systems are increasingly among the mechanisms and systems employed to assess the state of the world around us and collect intelligence through surveillance and reconnaissance missions. The current state of the art for humanitarian and military operations is still relying on human-controlled flight/asset operations, but with increasingly autonomous systems comes an opportunity to offload this to the devices themselves. In this paper, we present a principled and expandable methodology for evaluating the relative performance of a collective of autonomous devices in various scenarios. The proposed approach, which is illustrated with drone swarms as an example use case, is expected to develop into a generic tool to inform the deployment of such collectives, providing the means to infer key parameter values from problem specifications, known constraints, and objective functions.

14:45 - Flexible Motion Optimization with Modulated Assistive Forces – Nam Hee Kim (Aalto University), Hung Yu Ling (University of British Columbia), Zhaoming Xie (University of British Columbia), Michiel van de Panne (University of British Columbia) [click for abstract]

Animated motions should be simple to direct while also being plausible. We present a flexible keyframe-based character animation system that generates plausible simulated motions for both physically-feasible and physically-infeasible motion specifications. We introduce a novel control parameterization, optimizing over internal actions, external assistive-force modulation, and keyframe timing. Our method allows for emergent behaviors between keyframes, does not require advance knowledge of contacts or exact motion timing, supports the creation of physically impossible motions, and allows for near-interactive motion creation. The use of a shooting method allows for the use of any black-box simulator. We present results for a variety of 2D and 3D characters and motions, using sparse and dense keyframes. We compare our control parameterization scheme against other possible approaches for incorporating external assistive forces.

15:00 - GANSpaceSynth: Organising the Latent Space for Alternative Autonomous Features and Intelligent Behaviours on New Musical Instruments – Koray Tahiroglu (Aalto University), Miranda Kastemaa (Aalto University) and Oskar Koli (Aalto University) [click for abstract]

Generative models enable possibilities in audio domain to present timbre as vectors in a high-dimensional latent space with Generative Adversarial Networks (GANs). It is a common method in GAN models in which the musician’s control over timbre is mostly limited to sampling random points from the space and interpolating between them. In this talk, I present our novel hybrid GAN architecture, GANSpaceSynth, that allows musicians to explore the GAN latent space in a more controlled manner, identifying the audio features in the trained checkpoints and giving an opportunity to specify particular audio features to be present or absent in the generated audio samples. The applications of GANSpaceSynth, Hallu composition tool and AI-terity musical instrument, contribute to the work in generative systems, in audio domain, that makes it possible exploring GAN latent space with more awareness of what is happening musically and having the opportunity to control the development of musical creativity in a human-musician and AI cooperation.

SESSION 5: MACHINE LEARNING (15:30-16:30)

Session chair: Rohit Babbar

15:30 - L1-constrained hierarchical non-stationary Gaussian processes – Zheng Zhao (Aalto University), Rui Gao (Aalto University), Simo Särkkä (Aalto University) [click for abstract]

This work is concerned with regularized extensions of hierarchical non-stationary temporal Gaussian processes (NSGPs) in which the parameters (e.g., length-scale) are modeled as GPs. In particular, we consider two commonly used NSGP constructions which are based on explicitly constructed non-stationary covariance functions and stochastic differential equations, respectively. We extend these NSGPs by including L1-regularization on the processes in order to induce sparseness. To solve the resulting regularized NSGP (R-NSGP) regression problem we develop a method based on the alternating direction method of multipliers (ADMM) and we also analyze its convergence properties theoretically.

15:45 - System identification using Bayesian neural networks with nonparametric noise models – Christos Merkatas (Aalto University), Simo Särkkä (Aalto University) [click for abstract]

System identification is of special interest in science and engineering. This article is concerned with a system identification problem arising in stochastic dynamic systems, where the aim is to estimating the parameters of a system along with its unknown noise processes. In particular, we propose a Bayesian nonparametric approach for system identification in discrete time nonlinear random dynamical systems assuming only the order of the Markov process is known. The proposed method replaces the assumption of Gaussian distributed error components with a highly flexible family of probability density functions based on Bayesian nonparametric priors. Additionally, the functional form of the system is estimated by leveraging Bayesian neural networks which also leads to flexible uncertainty quantification. Asymptotically on the number of hidden neurons, the proposed model converges to full nonparametric Bayesian regression model. A Gibbs sampler for posterior inference is proposed and its effectiveness is illustrated in simulated and real time series.

16:00 - Likelihood-Free Inference in State-Space Models with Unknown Dynamics – Alexander Aushev (Aalto University), Thong Tran (Aalto University), Henri Pesonen (University of Oslo), Andrew Howes (Birmingham University and Aalto University), Samuel Kaski (Aalto University and University of Manchester) [click for abstract]

We introduce a method for inferring and predicting latent states in the important and difficult case of state-space models where observations can only be simulated, and transition dynamics are unknown. In this setting, the likelihood of observations is not available and only synthetic observations can be generated from a black-box simulator. We propose a way of doing likelihood-free inference (LFI) of states and state prediction with a limited number of simulations. Our approach uses a multi-output Gaussian process for state inference, and a Bayesian Neural Network as a model of the transition dynamics for state prediction. We improve upon existing LFI methods for the inference task, while also accurately learning transition dynamics. The proposed method is necessary for modelling inverse problems in dynamical systems with computationally expensive simulations, as demonstrated in experiments with non-stationary user models.

16:15 - Unbiased Loss Functions for Evaluation and Training with Missing Labels – Erik Schultheis (Aalto University), Rohit Babbar (Aalto University) [click for abstract]

This talk considers extreme multilabel classification (XMC) problems in a setting where labels are missing independently and with a known rate. The goal in XMC typically is to maximize either precision or recall at the top-ranked predictions, which can be achieved by reducing the multilabel problem into a series of binary (One-vs-All) or multiclass (Pick-all-Labels) problems. Missing labels are a ubiquitous phenomenon in XMC tasks, yet the interaction of missing labels and reductions has hitherto only been investigated for the case of One-vs-All reduction. In this paper, we close this gap by providing unbiased estimates for the Pick-all-Labels reduction, as well as the normalized reductions which are required for consistency with the recall metric. These estimators suffer from increased variance and may lead to ill-posed optimization problems, which we address by switching to convex upper-bounds. The theoretical considerations are supplemented by experiments showing that the unbiased estimators significantly alter the bias-variance trade-off.

SESSION 6: Human Aspects, Interactions and Applications (16:45-18:00)

Session chair: Simo Särkkä

16:45 - Practices and Infrastructures for ML Systems -- An Interview Study – Dennis Muiruri (University of Helsinki), Lucy Ellen Lwakatare (University of Helsinki), Jukka K. Nurminen (University of Helsinki), Tomi Mikkonen (University of Helsinki) [click for abstract]

The best practices and infrastructures for developing and maintaining machine learning (ML) enabled software systems are often reported by large and experienced data-driven organizations. However, little is known about the state of practice across other organizations. Using interviews, we investigated practices and toolchains for ML-enabled systems from sixteen organizations in various domains. Our study makes three broad observations related to data management practices, monitoring practices and automation practices in ML model training, and serving workflows. To a large extent, there are limited number of generic practices and tools applicable across organizations in different domains. We further use this work to inform about the choices of practices and infrastructure decisions within the VesselAI project. VesselAI project aims to apply ML techniques within the maritime domain characterized by extreme scale challenges. Nonetheless, there is great potential for ML-enabled applications in diverse use cases such as forecasting trajectories and predicting potential collisions of vessels.

17:00 - Sociotechnical Envelopment of Artificial Intelligence: An Approach to Organizational Deployment of Inscrutable Artificial Intelligence Systems – Aleksandre Asatiani (University of Gothenburg), Pekka Malo (Aalto University), Per Rådberg Nagbøl (IT University of Copenhagen), Esko Penttinen (Aalto University), Tapani Rinta-Kahila (University of Queensland), Antti Salovaara (Aalto University) [click for abstract]

The paper presents an approach for implementing inscrutable (i.e., nonexplainable) artificial intelligence (AI) in an accountable and safe manner in organizational settings. Drawing on an exploratory case study and the recently proposed concept of envelopment, it describes how an organization successfully “enveloped” its AI solutions to balance the AI's flexible performance with the risks that inscrutable models can entail. The paper presents several envelopment methods—establishing clear boundaries within which the AI is to interact with its surroundings, choosing and curating the training data well, and appropriately managing input and output sources—alongside their influence on the choice of AI models within the organization. This work makes illustrate how sociotechnical envelopment enables an organization to manage the trade-off between low explainability and high performance presented by inscrutable models. These contributions pave the way for more responsible, accountable AI implementations in organizations.

17:15 - mmWave Radar based Gesture Recognition: From Research to Practice – Dariush Salami (Aalto University), Ramin Hasibi (University of Bergen), Sameera Palipana (Aalto University), Luis Leiva (University of Luxembourg), Tom Michoel (University of Bergen), Stephan Sigg (Aalto University) [click for abstract]

Gesture recognition provides a natural and device-free way of non-verbal communications in a wide range of applications from vehicular scenarios to smart-home applications. RGB-Depth-based gesture recognition systems suffer from privacy issues since they make it possible to recognize people in the environment and every detail about them. Moreover, they fail to generalize on different lighting and weather conditions. To tackle the problems, we introduce mmWave FMCW radar-based gesture recognition systems that not only recognize gestures with high accuracy (up to 100% accuracy in case of using multiple radars) but also are robust to lighting and weather conditions while preserving privacy. In the first work entitled ""Pantomime: Mid-Air Gesture Recognition with Sparse Millimeter-Wave Radar Point Clouds"" which is published in IMWUT 2021, we proposed a neural network-based pipeline to sense the environment using the radar and process the data to recognize gestures. Although recognition accuracy was 95%, the model was computationally expensive to implement on embedded devices like Raspberry Pi. In the second work entitled ""Tesla-Rapture: A Lightweight Gesture Recognition System from mmWave Radar Point Clouds"" submitted to TMC, we introduced a graph-based neural network model to capture the Spatio-temporal dependencies in a single forward pass resulting in 98% recognition accuracy and 40 times computationally efficiency compared to Pantomime. Finally, in the last paper submitted to JSAC entitled ""Integrating Sensing and Communication in Cellular Networks via NR Sidelink"", we extended the radar-based gesture recognition idea to NR Sidelink concept to address the problem of congestion in the Radio Frequency (RF) spectrum. To do so, we used eight different NR Sidelink radars to demonstrate the concept addressing the shadowing problem achieving 100% recognition accuracy.

17:30 - Directing and Combining Multiple Queries for Exploratory Search by Visual Interactive Intent Modeling – Jonathan Strahl (Aalto University), Jaakko Peltonen (Tampere University), Patrik Floreen (University of Helsinki) [click for abstract]

In interactive information-seeking, a user often performs many interrelated queries and interactions covering multiple aspects of a broad topic of interest. Especially in difficult information-seeking tasks the user may need to find what is in common among such multiple aspects. Therefore, the user may need to compare and combine results across queries. While methods to combine queries or rankings have been proposed, little attention has been paid to interactive support for combining multiple queries in exploratory search. We introduce an interactive information retrieval system for exploratory search with multiple simultaneous search queries that can be combined. The user is able to direct search in the multiple queries, and combine queries by two operations: intersection and difference, which reveal what is relevant to the user intent of two queries, and what is relevant to one but not the other. Search is directed by relevance feedback on visualized user intent models of each query. Operations on queries act directly on the intent models inferring a combined user intent model. Each combination yields a new result (ranking) and acts as a new search that can be interactively directed and further combined. User experiments on difficult information-seeking tasks show that our novel system with query operations yields more relevant top-ranked documents in a shorter time than a baseline multiple-query system.

17:45 - Entitybot: Supporting Everyday Digital Tasks with Entity Recommendations – Tung Vuong (University of Helsinki), Salvatore Andolina (Università degli Studi di Palermo), Giulio Jacucci (University of Helsinki), Pedram Daee (Aalto University), Khalil Klouche (University of Helsinki), Mats Sjöberg (CSC), Tuukka Ruotsalo (University of Helsinki), Samuel Kaski (Aalto University) [click for abstract]

Everyday digital tasks can highly benefit from systems that recommend the right information to use at the right time. However, existing solutions typically support only specific applications and tasks. We demonstrate the EntityBot, a system that captures context across application boundaries and recommends information entities related to the current task. The user's digital activity is continuously monitored by capturing all content on the computer screen using optical character recognition. This includes all applications and services being used and specific to individuals' computer usages such as instant messaging, emailing, web browsing, and word processing. A linear model is then applied to detect the user's task context to retrieve entities such as applications, documents, contact information, and several keywords determining the task. The system has been evaluated with real-world tasks, demonstrating that the recommendation had an impact on the tasks and led to high user satisfaction.

Online talks

SESSION 7 (15:15-16:30)

Session chair: Tomi Janhunen

15:15 - Privacy protection in Federated learning – Jing Ma (Aalto University), Stephan Sigg (Aalto University) [click for abstract]

With the advance of machine learning and the Internet of Things (IoT), security and privacy have become key concerns in mobile services and networks. Transferring data to a central unit violates privacy as well as protection of sensitive data while increasing bandwidth demands. Federated learning mitigates this need to transfer local data by sharing model updates only. However, privacy leakage remains an issue. This paper proposes xMK-CKKS, a multi-key homomorphic encryption protocol, to design a novel privacy-preserving federated learning scheme. In this scheme, model updates are encrypted via an aggregated public key before sharing with a server for aggregation. For decryption, a collaboration among all participating devices is required. Our scheme prevents privacy leakage from publicly shared model updates in federated learning and is resistant to collusion between $k<N data-preserve-html-node="true" data-preserve-html-node="true" data-preserve-html-node="true"-1$ participating devices and the server. The evaluation demonstrates that the scheme outperforms other innovations in communication cost and computational cost while preserving model accuracy. It is therefore more practical to IoT scenarios.

15:30 - Synthetic minority oversampling of vital statistics data with generative adversarial networks – Aki Koivu (University of Turku), Mikko Sairanen (University of Turku), Antti Airola (University of Turku), Tapio Pahikkala (University of Turku) [click for abstract]

Minority oversampling is a standard approach used for adjusting the ratio between the classes on imbalanced data. However, established methods often provide modest improvements in classification performance when applied to data with extremely imbalanced class distribution and to mixed-type data. This is usual for vital statistics data, in which the outcome incidence dictates the amount of positive observations. In this presentation, we showcase our novel neural network-based oversampling method called actGAN (activation-specific generative adversarial network, Koivu et al. 2020) that can derive useful synthetic observations in terms of increasing prediction performance in this context.

From vital statistics data, the outcome of early stillbirth was chosen to be predicted based on demographics, pregnancy history, and infections. The data contained 363 560 live births and 139 early stillbirths, resulting in class imbalance of 99.96% and 0.04%. The hyperparameters of actGAN and a baseline method SMOTE-NC (Synthetic Minority Over-sampling Technique-Nominal Continuous) were tuned with Bayesian optimization, and both were compared against a cost-sensitive learning-only approach.

While SMOTE-NC provided mixed results, actGAN was able to improve true positive rate at a clinically significant false positive rate and area under the curve from the receiver-operating characteristic curve consistently. Including an activation-specific output layer to a generator network of actGAN enabled the addition of information about the underlying data structure, which overperforms the nominal mechanism of SMOTE-NC.

actGAN provides an improvement to the prediction performance for our learning task. Our developed method could be applied to other mixed-type data prediction tasks that are known to be afflicted by class imbalance and limited data availability.

In medical applications, deep learning methods are built to automate diagnostic tasks, often formulated as single-target classification problems. However, a clinically relevant question that practitioners usually face, is how to predict the future trajectory of a disease (prognosis). Current methods for such a problem often require domain knowledge, and are complicated to apply. In this paper, we formulate the prognosis prediction problem as a one-to-many forecasting problem. Inspired by a clinical decision-making process with two agents -- a radiologist and a general practitioner, we model a prognosis prediction problem with two transformer-based components that share information between each other. The first transformer in this model aims to analyze the imaging data, and the second one leverages its internal states as inputs, also fusing them with auxiliary patient data. We show the effectiveness of our method in predicting the development of structural knee osteoarthritis changes, and forecasting Alzheimer's disease clinical status. Our results show that the proposed method outperforms the state-of-the-art baselines in terms of various performance metrics, including calibration, which is desired from a medical decision support system.

16:00 - Object detection for the analysis of creep voids in high-temperature metallic structures – Akhtar Zeb (VTT), Mikko Tahkola (VTT), Rami Pohja (VTT), Janne Pakarinen (VTT) [click for abstract]

Metallic high-temperature structures are subject to creep. Creep means significant viscous time-dependent and liquid-like material flow in the direction of the principal stress leading eventually to component failure. One important manifestation of creep is the voids at the grain boundaries of the material. The reliable and accurate detection of creep void density and the size of voids is an important step to improve the high-temperature component lifecycle management.

Usually creep void analysis in service conditions is performed by replica inspection. The interpretation of the results can often be difficult and time-consuming. Large number of voids makes density and size analysis challenging and increases possibility of human misinterpretation. We show that AI-based creep void detection removes these obstacles by quickly processing large number of sample images with high detection accuracy.

Computer vision (CV), a field of artificial intelligence (AI), has been utilized in industries ranging from energy to manufacturing to, e.g., detect issues in products and processes based on image and video data. Object detection aspect of CV focuses on detection and identification of objects in an image or video, such as face detection and damage identification in machines.

We applied the YOLOv5s model with its default configuration and pretrained weights on scanning electron microscope (SEM) images of oxygen-free phosphorus-doped copper sample surfaces containing creep voids. Each of the 40 high-resolution SEM images was split into four smaller images and annotated with online tool CVAT. The model was trained using 160 images, and satisfactory mean average precision of 0.82 was achieved. By adjusting the confidence threshold, the model predicts all the creep voids in test images correctly. The trained model outputs coordinates of the voids in the images and cropped images that contain only the voids, which can be used to calculate the void density and sizes.

16:15 - Micro-expression action unit detection with spatial and channel attention – Yante Li (University of Oulu), Guoying Zhao (University of Oulu) [click for abstract]

Action Unit detection plays an important role in facial behaviour analysis. However, there is limited research about AU analysis for micro-expressions. Due to the small quantity and low intensity of micro-expression databases, micro-expression AU detection becomes challenging. To alleviate these problems, we propose a novel micro-expression AU detection method by utilizing self high-order statistics of spatio-wise and channel-wise features which can be considered as spatial and channel attentions, respectively. Through such spatial attention module, we expect to utilize rich relationship information of facial regions to increase the AU detection robustness on limited micro-expression samples. In addition, considering the low intensity of micro-expression AUs, we further propose to explore high-order statistics for better capturing subtle regional changes on face to obtain more discriminative AU features to achieve robust micro-expression AU detection.

SESSION 8 (16:45-18:00)

Session chair: Guoying Zhao

16:45 - Predicting Headline Effectiveness in Online News Media using Transfer Learning with BERT – Jaakko Tervonen (VTT Technical Research Centre of Finland), Tuomas Sormunen (VTT Technical Research Centre of Finland), Arttu Lämsä (VTT Technical Research Centre of Finland), Johannes Peltola (VTT Technical Research Centre of Finland), Heidi Kananen (Kaleva Media), Sari Järvinen (VTT Technical Research Centre of Finland) [click for abstract]

The decision to read an article in online news media or social networks is often based on the headline, and thus writing effective headlines is an important but difficult task for the journalists and content creators. Even defining an effective headline is a challenge, since the objective is to avoid click-bait headlines and be sure that the article contents fulfill the expectations set by the headline. Once defined and measured, headline effectiveness can be used for content filtering or recommending articles with effective headlines. In this paper, a metric based on received clicks and reading time is proposed to classify news media content into four classes describing headline effectiveness. A deep neural network model using the Bidirectional Encoder Representations from Transformers (BERT) is employed to classify the headlines into the four classes, and its performance is compared to that of journalists. The proposed model achieves an accuracy of 59% on the four-class classification, and 72-78% on corresponding binary classification tasks. The model outperforms the journalists being almost twice as accurate on a random sample of headlines.

17:00 - An Empirical Investigation of Word Alignment Supervision for Zero-shot Multilingual Neural Machine Translation – Alessandro Raganato (University of Helsinki), Raúl Vázquez (University of Helsinki), Mathias Creutz (University of Helsinki), Jörg Tiedemann (University of Helsinki) [click for abstract]

Multilingual Neural Machine Translation (MNMT) systems are usually trained with a language label prepended to the input indicating the target language. However, these models have several flaws in zero-shot scenarios where language labels are ignored and the wrong language is generated, showing unstable results. In this talk, I will present the benefits of an explicit alignment to language labels in Transformer-based MNMT models in the zero-shot context, by jointly training one cross attention head with word alignment supervision to stress the focus on the target language label. We show that simply supervising one cross attention head to focus both on word alignments and language labels reduces the bias towards translating into the wrong language, improving the zero-shot performance overall. Moreover, as an additional advantage, we find that the alignment supervision leads to more stable results across different training runs.

17:15 - Probabilistic Early Warning Signals – Ville Laitinen (Turku University), Vasilis Dakos (ISEM, University of Montpellier), Leo Lahti (Turku University) [click for abstract]

Ecological communities and other complex systems can undergo abrupt and long-lasting reorganization, a regime shift, when deterministic or stochastic factors bring them to the vicinity of a tipping point between alternative states. Such changes can be large and often arise unexpectedly. However, theoretical and experimental analyses have shown that changes in correlation structure, variance, and other standard indicators of biomass, abundance, or other descriptive variables are often observed prior to a state shift, providing early warnings of an anticipated transition. Natural systems manifest unknown mixtures of ecological and environmental processes, hampered by noise and limited observations. As data quality often cannot be improved, it is important to choose the best modeling tools available for the analysis.

We investigate three autoregressive models and analyze their theoretical differences and practical performance. We formulate a novel probabilistic method for early warning signal detection and demonstrate performance improvements compared to nonprobabilistic alternatives based on simulation and publicly available experimental time series.

The probabilistic formulation provides a novel approach to early warning signal detection and analysis, with enhanced robustness and treatment of uncertainties. In real experimental time series, the new probabilistic method produces results that are consistent with previously reported findings.

Robustness to uncertainties is instrumental in the common scenario where mechanistic understanding of the complex system dynamics is not available. The probabilistic approach provides a new family of robust methods for early warning signal detection that can be naturally extended to incorporate variable modeling assumptions and prior knowledge.

17:30 - Solution Enumeration by Optimality in Answer Set Programming – Jukka Pajunen (Aalto University), Tomi Janhunen (Tampere University) [click for abstract]

Given a combinatorial search problem, it may be highly useful to enumerate its (all) solutions besides finding one solution, or showing their non-existence. The same holds for optimal solutions given an objective function. This work goes beyond the enumeration of optimal solutions and addresses the computational task of solution enumeration by optimality (SEO) in the context of Answer Set Programming (ASP) where problem solutions are captured with the answer sets of logic programs encoding problems. Existing answer-set solvers already support the enumeration of all (optimal) answer sets. However, in this work, we generalize enumeration beyond strictly optimal answer sets, giving rise to the idea of answer set enumeration in the order of optimality (ASEO). This approach is applicable up to the best k answer sets or in an unlimited setting, amounting to a process of sorting answer sets based on the objective function. As our main contributions, we present the first general algorithms for answer set enumeration and illustrate potential use cases of ASEO. First, we study how efficiently the next-best solutions can be generated in a number of optimization problems that have been solved in ASP. Second, we show that ASEO provides an effective sampling technique for Bayesian networks.

17:45 - Collaborative Filtering with Preferences Inferred from Brain Signals – Keith M. Davis III (University of Helsinki), Michiel Spape (University of Helsinki), Tuukka Ruotsalo (University of Helsinki) [click for abstract]

Collaborative filtering is a common technique in which interaction data from a large number of users are used to recommend items to an individual that the individual may prefer but has not interacted with. Previous approaches have achieved this using a variety of behavioral signals, from dwell time and clickthrough rates to self-reported ratings. However, such signals are mere estimations of the real underlying preferences of the users. Here, we use brain-computer interfacing to infer preferences directly from the human brain. We then utilize these preferences in a collaborative filtering setting and report results from an experiment where brain inferred preferences are used in a neural collaborative filtering framework. Our results demonstrate, for the first time, that brain-computer interfacing can provide a viable alternative for behavioral and self-reported preferences in realistic recommendation scenarios. We also discuss the broader implications of our findings for personalization systems and user privacy.

Onsite posters

Hosted poster session 18:00–19:00

Investigation of different ML approaches in classification of emotions induced by acute stress – Heba Sourkatti (VTT), Kati Pettersson (VTT), Bart van der Sanden (Eindhoven University of Technology), Mikko Lindholm (VTT), Johan Plomp (VTT), Ilmari Määttänen (University of Helsinki), Pentti Henttonen (University of Helsinki), Johanna Närväinen (VTT) [click for abstract]

The performance of five commonly used machine learning (ML) models was compared in binary classification of subjectively assessed emotions, induced by everyday-relevant stimuli. The tasks were cognitive and physical challenges, and the emotional dimensions (arousal, valence, and dominance) were reported after the task (scale 1 to 9) and split into high and low classes. The features for the ML modeling were psychophysiological parameters (heart rate and variability, skin conductivity, EEG-derived brainbeat, and eye blinks), subjective, and objective task-related parameters, and personality features from 26 healthy adult volunteers.

The psychophysiological responses proved the tasks were successful in changing the mental state from baseline, and that the cognitive and physical tasks were different. The model performance was evaluated using leave-one-out and nested cross-validation in terms of accuracy, precision, recall and F1-scores. After running the models with standard settings, methods to account for imbalanced classes were applied and shown to improve the classification performance. Logistic Regression (LR) and Gaussian Naive Bayes (GNB) were efficient in classifying arousal in balanced data sets (arousal and valence), as did Support Vector Machine (valence). However, these failed for imbalanced data sets, for which balanced versions of LR and GNB, as well as RusBoost, performed reasonably well. In cases with severe imbalance (only two samples in the low class), none of the models performed satisfactorily. Our data represent a typical setup in affective computing utilizing psychophysiological monitoring: number of participants is low compared to number of features, inter-individual variability is very high, and class imbalance cannot be avoided. The practical observations can be listed a) if possible, include features representing physiology, behavior and personality, b) use simple models and limited number of features to improve interpretability, c) address the possible class imbalance, and d) if the data size allows, use nested cross-validation. Additionally, the use of SHAP-values provides an intuitive way to investigate feature importance in the classification.

Embedding Convolutions for Short Text Extreme Classification with Millions of Labels – Siddhant Kharbanda (Aalto University), Atmadeep Banerjee (Aalto University), Akash Palrecha (Aalto University), Rohit Babbar (Aalto University) [click for abstract]

Automatic annotation of short-text data to a large number of target labels, referred to as Short Text Extreme Classification, has recently found numerous applications in prediction of related searches and product recommendation tasks. The conventional usage of Convolutional Neural Network (CNN) to capture n-grams in text-classification relies heavily on uniformity in word-ordering and the presence of long input sequences to convolve over. However, this is missing in short and unstructured text sequences encountered in search and recommendation. In order to tackle this, we propose an orthogonal approach by recasting the convolution operation to capture coupled semantics along the embedding dimensions, and develop a word-order agnostic embedding enhancement module to deal with the lack of structure in such queries. Benefitting from the computational efficiency of the convolution operation, Embedding Convolutions, when applied on the enriched word embeddings, result in a light-weight and yet powerful encoder (INCEPTIONXML) that is robust to the inherent lack of structure in short-text extreme classification. Towards scaling our model to problems with millions of labels, we also propose INCEPTIONXML+, which addresses the shortcomings of the dynamic hard-negative mining framework in the recently proposed LIGHTXML by improving the alignment between the label-shortlister and extreme classifier. On popular benchmark datasets, we empirically demonstrate that the proposed method outperforms all previous state-of-the-art deep extreme classifiers such as ASTEC by an average of 5% and 8% on the P@k and propensity-scored PSP@k metrics respectively.

De-randomizing MCMC dynamics with the diffusion Stein operator – Zheyang Shen (Aalto University), Markus Heinonen (Aalto University), Samuel Kaski (Aalto University and University of Manchester) [click for abstract]

Approximate Bayesian inference determines proper descriptors of an intractable target distribution – in essence, an optimization problem within a family of distributions. For example, Langevin dynamics (LD) extracts asymptotically exact samples from a diffusion process because the time evolution of its marginal distributions constitutes a curve that minimizes the KL-divergence via steepest descent in the Wasserstein space. Parallel to LD, Stein variational gradient descent (SVGD) similarly minimizes the KL, albeit endowed with a novel Stein-Wasserstein distance, by deterministically transporting a set of particle samples, thus de-randomizes the stochastic diffusion process. We propose de-randomized kernel-based particle samplers to all diffusion-based samplers known as MCMC dynamics. Following previous work in interpreting MCMC dynamics, we equip the Stein-Wasserstein space with a fiber-Riemannian Poisson structure, with the capacity of characterizing a fiber-gradient Hamiltonian flow that simulates MCMC dynamics. Such dynamics discretizes into generalized SVGD (GSVGD), a Stein-type deterministic particle sampler, with particle updates coinciding with applying the diffusion Stein operator to a kernel function. We demonstrate empirically that GSVGD can de-randomize complex MCMC dynamics, which combine the advantages of auxiliary momentum variables and Riemannian structure, while maintaining the high sample quality from an interacting particle system.

Prediction of dynamic behavior of a Single-Shaft Gas Turbine Using NARX Models – Hamid Asgari (VTT), Emmanuel Ory (VTT) [click for abstract]

Gas turbines are internal combustion engines widely used in industry as main source of power for aircrafts, turbo-generators, turbo-pumps and turbo-compressors. Modelling these engines can help to improve their design and manufacturing processes, as well as to facilitate operability and maintenance of these machinery. These eventually lead to manufacturing of gas turbines with lower costs and higher efficiency at the same time. The models may also be employed to unfold nonlinear dynamics of these systems. The aim of this study is to predict the dynamic behavior of a single shaft gas turbine by using open-loop and closed-loop NARX models, which are subsets of artificial neural networks. To set up these models, datasets of important variables of the gas turbine are used for training, validation and test processes. For this purpose, a comprehensive programming code is developed in MATLAB programming environment. In addition to the open-loop model, a closed-loop model was also set up for multi-step prediction. The results of this study demonstrate the capability of the NARX models in reliable prediction of gas turbines dynamic behavior over different operational ranges.

On the differences between BERT and MT encoder spaces and how to address them in translation tasks – Raúl Vázquez (University of Helsinki), Hande Celikkanat (University of Helsinki), Mathias Creutz (University of Helsinki), Jörg Tiedemann (University of Helsinki) [click for abstract]

Various studies show that pre-trained language models such as BERT cannot straightforwardly replace encoders in neural machine translation despite their enormous success in other tasks. This is even more astonishing considering the similarities between the architectures. This paper sheds some light on the embedding spaces they create, using average cosine similarity, contextuality metrics and measures for representational similarity for comparison, revealing that BERT and NMT encoder representations look significantly different from one another. In order to address this issue, we propose a supervised transformation from one into the other using explicit alignment and fine-tuning. Our results demonstrate the need for such a transformation to improve the applicability of BERT in MT.

Speeding-up One-vs-All Training for Extreme Classification via Smart Initialization – Erik Schultheis (Aalto University), Rohit Babbar (Aalto University) [click for abstract]

In this paper we show that a simple, data dependent way of setting the initial vector can be used to substantially speed up the training of linear one-versus-all (OVA) classifiers in extreme multi-label classification (XMC). We discuss the problem of choosing the initial weights from the perspective of three goals. We want to start in a region of weight space a) with low loss value, b) that is favourable for second-order optimization, and c) where the conjugate-gradient calculations can be performed quickly. For margin losses, such an initialization is achieved by selecting the initial vector such that it separates the mean of all positive (relevant for a label) instances from the mean of all negatives – two quantities that can be calculated quickly for the highly imbalanced binary problems occurring in XMC. We demonstrate a speedup of ≈ 3× for training with squared hinge loss on a variety of XMC datasets. Due to the convex nature of the optimization problem, the speedup is achieved without any degradation in classification accuracy.

Who dares AI - AI MOOCs at the University of Helsinki – Petri Ihantola (University of Helsinki), Anton Salovuori (HAUS kehittämiskeskus Oy), Kukka-Maaria Polso (University of Helsinki), Joonas Pesonen (University of Helsinki), Teemu Roos (University of Helsinki), Anna-Mari Rusanen (Ministry of Finance) [click for abstract]

Artificial Intelligence (AI) is all around in our society. Thus, it's important for everyone to understand the basics of it: what AI is capable of doing, what are the main limitations, how it affects our lives, and what ethical questions should be asked about the use of AI. To teach these skills, University of Helsinki MOOC Center helps in providing multiple Massive Open Online Courses (MOOCs) free for everyone. These courses include Elements of AI, Building AI, and Ethics of AI. Here, we have investigated who the students of these courses are and how the industry could help in making the new skills to their employees better use.

Participants of the courses come from different backgrounds, including unemployed, employed, students of other institutions, pensioners, etc., a majority of participants being employed, however. Different industries and company sizes are equally presented, underlining the widespread interest in the topic. When looking at the employed participants more closely, there are clear differences in workplace support, which are not explained by the company size or industry. Moreover, participants can be divided into four distinct profiles. First, Widely-Supported were allowed to complete the course during their working hours, and their work community was interested in the topic. The other three profiles were Supported by Work Community, Work Time Utilizers and Unsupported. Widely-Supported and Supported by Work Community considered the courses more useful (for their work) than the other two profiles. These results are similar to previous studies highlighting the importance of the work community support. We recommend all companies to be supportive and interested in what their employees study.

Human-AI Collaboration for Experimental Design – Nishtha Vaidya (Aalto University and Indian Institute of Technology Madras), Pierre-Alexandre Murena (Aalto University), Samuel Kaski (Aalto University and University of Manchester) [click for abstract]

Designing experiments to run for scientific inquiry is a widely needed and difficult task in which we need assistance, and where AI’s skills in handling data and inference can complement us. In this paper we address the essential problem of human-AI collaboration for experimental design, in particular how the AI can efficiently communicate assistance to the human: AI needs to teach the scientist what data tells about causal structure of the object under study. This is difficult because the scientist may have constraints and incorrect earlier knowledge, and because the teaching needs to be done on a budget. We show that an AI assistant, equipped with a model of how the scientist understands the object and the global experiment, is able to teach the scientist the correct model of the object by suggesting experiments to run, or direct modifications to the model, which the scientist can choose to accept or ignore.

TAIGA: a novel dataset for multitask learning of continuous and categorical forest variables from hyperspectral imagery – Matti Mõttus (VTT Technical Research Centre of Finland), Phu Pham (Purdue University), Eelis Halme (VTT Technical Research Centre of Finland), Matthieu Molinier, (VTT Technical Research Centre of Finland), Hai Cu (Aalto University), Jorma Laaksonen (Aalto University) [click for abstract]

The spectral and spatial resolutions of modern optical Earth observation data are continuously increasing. To fully utilize thedata, integrate them with other information sources and create applications relevant to real-world problems, extensivetraining data are required. We present TAIGA, an open dataset including continuous and categorical forestry data,accompanied by airborne hyperspectral imagery with a pixel size of 0.7 m. The dataset contains over 70 million labeledpixels belonging to more than 600 forest stands. To establish a baseline on TAIGA dataset for multitask learning, we trainand validate a convolutional neural network to simultaneously retrieve 13 forest variables. Due to the size of the imagery, thetraining and testing sets were independent, with strictly no overlap for patches up to 45 45 pixels. Our retrieval results showthat including both spectral and textural information improves the accuracy of mapping key boreal forest structuralcharacteristics, compared with an earlier study including only spectral information from the same image. TAIGA responds tothe increased availability of hyperspectral and very high resolution imagery, and includes the forestry variables relevant forforestry and environmental applications. We propose the dataset as a new benchmark for spatial-spectral methods thatovercomes limitations of widely used small-scale hyperspectral datasets.

Fourier-Hermite dynamic programming for optimal control – Syeda Sakira Hassan (Aalto University), Simo Särkkä (Aalto University) [click for abstract]

In this paper, we propose a novel method to approximate the value function of dynamic programming. The method is based on the use of the Fourier--Hermite series for approximating the action-value function of dynamic programming instead of the second-order Taylor-series expansion used in the differential dynamic programming. The Fourier--Hermite series can be approximated by sigma point methods which lead to novel sigma-point based dynamic programming methods. We apply the proposed methods to optimal control problems in order to show the practical performance of the methods.

Redefining creativity in the age of artificial intelligence? Finnish artists co-creating with AI – Riina Lundman (University of Turku), Paulina Nordström (University of Agder), Roosa Wingström (University of Turku), Johanna Hautala (University of Vaasa) [click for abstract]

Creative artificial intelligence that could paint, compose, write, or perform art is one of the intriguing current topics in the field of AI development and research. In our research project Co-creativity in the Era of Artificial Intelligence (LuotAI), we approach the themes of creativity, art, and AI from the perspectives of social sciences and human geography. In our view, AI is a novel actor in the creative process that historically relies on human-centred understanding about creativity. On the other hand, human aspects can bring new insight to the development of creative AI, hence making the process more interactive. All in all, the changes in the society, technology, and creative endeavour are so substantial that we are interested to study, whether the concept of creativity needs to be redefined in the age of AI.

We have interviewed 53 Finland-based artists and computer scientists who create with AI in their daily work. In this presentation, we focus on the empirical material and answers collected from the artists (N=27), thus giving voice to the creative users of AI themselves. We employ the concept of co-creativity to better grasp the creativity process evolving around the art-making practices between humans and AI-based technology (mainly GANs). The presentation is a compilation of our main results in the project so far, including aspects regarding the artists’ views on the roles of AI whether as a tool, medium or partner in their work. We argue that – at least in the domain of arts – the question of an independently creative AI is not interesting as such but it is more fruitful to see humans and AI as being co-creative together. Moreover, in addition to simply asking what AI can give to art, we have inquired about what art can give to AI. We believe that our rich interview material together with our (post)human perspective on creative AI is of the wider interest of the Finnish AI research community.

Tight Accounting in the Shuffle Model of Differential Privacy – Antti Koskela (University of Helsinki), Mikko A. Heikkilä (University of Helsinki), Antti Honkela (University of Helsinki) [click for abstract]

Shuffle model of differential privacy is a novel distributed privacy model based on a combination of local privacy mechanisms and a trusted shuffler. It has been shown that the additional randomisation provided by the shuffler improves privacy compared to the purely local mechanisms. Accounting tight bounds, especially for multi-message protocols, is complicated by the complexity brought by the shuffler. The recently proposed Fourier Accountant for evaluating (ε,δ)-differential privacy guarantees has been shown to give tighter bounds than commonly used methods for non-adaptive compositions of various complex mechanisms. We show how to compute tight privacy bounds using the Fourier Accountant for multi-message versions of several ubiquitous mechanisms in the shuffle model.

Value of DESS MRI in prediction of knee osteoarthritis progression through the lens of deep learning – Egor Panfilov (University of Oulu), Aleksei Tiulpin (Aalto University), Miika T. Nieminen (University of Oulu, Oulu University Hospital), Simo Saarakkala (University of Oulu, Oulu University Hospital) [click for abstract]

Accurate prediction of knee osteoarthritis (KOA) progression may enable early disease intervention, support subject selection in clinical trials, and advance disease understanding. Compared to demographic and radiographic data, MRI protocols visualize additional risk factors, primarily related to soft tissue status. Manually designed MRI-based biomarkers are useful in studying KOA progression, however, their generalization to larger cohorts remains uncertain. In this work, we studied the performance of DL in predicting KOA progression from structural MRI data on a large patient cohort. We compared the model to the ones based on widely available clinical and radiographic data, thus, providing an insight into the relative value of MRI for the problem.

We used the data from the Osteoarthritis Initiative baseline. Four progression criteria were defined as an increase in radiographic KOA severity Kellgren-Lawrence grade (KLG) within the 12, 24, 36, and 48 months, respectively. The sample sizes were 8009, 7548, 7221, and 6919 knees. The reference models were based on logistic regression for age, sex, BMI, history of knee injury and surgery, WOMAC score, and KLG. The MRI-based DL models were trained to predict the targets directly from the DESS MRI data. Here, we used a CNN to extract the features from MRI slices, a Transformer to incorporate cross-slice attention, followed by a classification layer. The models were compared with average precision (AP) and ROC AUC metrics on hold-out data, with “progressor” set as a positive class.

The highest APs were 0.15(0.03) (clinical+KLG), 0.18(0.02), 0.25(0.03), and 0.33(0.03) (MRI) for 12-, 24-, 36-, and 48-month targets, respectively. The highest ROC AUCs were 0.72(0.02) (clinical+KLG), 0.73(0.02) (MRI), 0.71(0.02) (MRI/clinical+KLG), and 0.76(0.02) (MRI) for 12-, 24-, 36-, and 48-month targets, respectively. Our results suggest that DESS MRI may have an added value in prediction of KOA progression after 2 years.

Temporal Gaussian Process Regression in Logarithmic Time – Adrien Corenflos (Aalto University), Zheng Zhao (Aalto University), Simo Särkkä (Aalto University) [click for abstract]

We present a novel parallelization method for temporal Gaussian process (GP) regression problems. The method allows for solving GP regression problems in logarithmic O(log N) time, where N is the number of time steps. Our approach uses the state-space representation of GPs which in its original form allows for linear O(N) time GP regression by leveraging the Kalman filtering and smoothing methods. By using a recently proposed parallelization method for Bayesian filters and smoothers, we are able to reduce the linear computational complexity of the temporal GP regression problems into logarithmic span complexity. This ensures logarithmic time complexity when run on parallel hardware such as a graphics processing unit (GPU). We experimentally demonstrate the computational benefits on simulated and real datasets via our open-source implementation leveraging the GPflow framework.

EYES-project case study: Ultra-Short Window Length and Feature Importance Analysis for Cognitive Load Detection from Wearable Sensors – Jaakko Tervonen (VTT), Kati Pettersson (VTT), Jani Mäntyjärvi (VTT) [click for abstract]

Human cognitive capabilities are under constant pressure in the modern information society. Cognitive load detection would be beneficial in several applications of human–computer interaction, including attention management and user interface adaptation. The Academy-project EYES aims to explore and develop novel & seamless cognitive state estimation methods for real-time & real-life settings.

Current research for cognitive load detection lacks understanding of the optimal and minimal window length in data segmentation, which would allow for more timely, continuous state detection. This study presents a comparative analysis of ultra-short (30 s or less) window lengths in cognitive load detection with a wearable device. Heart rate, heart rate variability, galvanic skin response, and skin temperature features are extracted at six different window lengths and used to train an Extreme Gradient Boosting classifier to detect between cognitive load and rest. A 25 s window showed the highest accuracy (67.6%), which is similar to earlier studies using the same dataset. Overall, model accuracy tended to decrease as the window length decreased, and lowest performance (60.0%) was observed with a 5 s window. The contribution of different physiological features to the classification performance and the most useful features that react in short windows are also discussed. The analysis provides a promising basis for future real-time applications with wearable sensors.

Biophysical network models of phase-synchronization in MEG resting-state – Nitin Williams (Aalto University), Benedetta Toselli (University of Genoa, Italy), Felix Siebenhühner (University of Helsinki), Satu Palva (University of Helsinki), Gabriele Arnulfo (University of Genoa, Italy), Samuel Kaski (Aalto University), J. Matias Palva (Aalto University) [click for abstract]

Magnetoencephalography (MEG) is used extensively to study functional connectivity (FC) networks of phase-synchronization, but the relationship of these networks to their biophysical substrates is poorly understood. FC networks of phase-synchronization are the set of correlations between phases of neuronal oscillations from different brain regions. Biophysical Network Models (BNMs) are used to produce networks corresponding to MEG-derived networks of phase-synchronization, but the roles of inter-regional conduction delays, the structural connectome and the model of region dynamics, in obtaining this correspondence remain unknown. In this study, we investigated the roles of conduction delays, the structural connectome, and the model of regional dynamics, in obtaining a correspondence between model-generated and MEG-derived networks. To do this, we compared three BNMs, respectively comprising Wilson-Cowan oscillators interacting with diffusion Magnetic Resonance Imaging (MRI)-based patterns of structural connections through zero delays, constant delays and distance-dependent delays respectively. The Wilson-Cowan zero delays model produced networks with a closer correspondence to the MEG-derived network than those produced by the constant delays model, and equal to those produced by the distance-dependent delays model. Further investigating the Wilson-Cowan zero delays model by comparing it to null models revealed that both the pattern of structural connections and Wilson-Cowan oscillatory dynamics contribute to the correspondence between model-generated and MEG-derived networks. Hence, we find no evidence that including conduction delays improves the correspondence between model-generated and MEG-derived networks, but that the structural connectome and the model of region dynamics do contribute to the observed correspondence. These findings result in a parsimonious BNM that produces networks corresponding closely to MEG-derived networks of phase-synchronization.

Computing Differential Privacy Guarantees for Heterogeneous Compositions Using FFT – Antti Koskela (University of Helsinki), Antti Honkela (University of Helsinki) [click for abstract]

The recently proposed Fast Fourier Transform (FFT)-based accountant for evaluating (ε,δ)-differential privacy guarantees using the privacy loss distribution formalism has been shown to give tighter bounds than commonly used methods such as Rényi accountants when applied to homogeneous compositions of identical mechanisms. In this work, we extend this approach to heterogeneous compositions. We carry out a full error analysis that allows choosing the parameters of the algorithm such that a desired accuracy is obtained. The analysis also extends previous results by analysing all the error sources of the (ε,δ)-approximation. We also show how to speed up the evaluation of tight privacy guarantees using the Plancherel theorem at the cost of increased pre-computation and memory usage.

Online posters

Hosted poster session 18:00–19:00

A Grid-Structured Model of Tubular Reactors – Katsiaryna Haitsiukevich (Aalto University), Samuli Bergman (Neste), Cesar de Araujo Filho (Neste), Francesco Corona (Aalto University), Alexander Ilin (Aalto University) [click for abstract]

We propose a grid-like computational model of tubular reactors. The architecture is inspired by the computations performed by solvers of partial differential equations which describe the dynamics of the chemical process inside a tubular reactor. The proposed model may be entirely based on the known form of the partial differential equations or it may contain generic machine learning components such as multi-layer perceptrons. We show that the proposed model can be trained using limited amounts of data to describe the state of a fixed-bed catalytic reactor. The trained model can reconstruct unmeasured states such as the catalyst activity using the measurements of inlet concentrations and temperatures along the reactor.

Link to online poster meeting: https://aalto.zoom.us/j/67433664409

Combining Rule-based System and Machine Learning to Classify Unsupervised Semi-natural Language Data – Zafar Hussain (University of Helsinki), Jukka K. Nurminen (University of Helsinki), Tommi Mikkonen (University of Helsinki), Marcin Kowiel (F-Secure Corporation) [click for abstract]

Shell commands form a special kind of semi-natural language. Analyzing shell commands’ structure and classifying them is a useful approach in the field of cyber security to detect anomalous commands used by malicious actors. Without any contextual knowledge, commands’ analysis is a difficult task as similar-looking commands might be performing different tasks, and commands with different aliases might be performing the same tasks. To understand shell commands’ structure and their syntactic and semantic meanings, we created a rule-based system based on expert opinions. Using this system, we classified the shell commands into similar and not-similar classes. This rule-based system transformed shell commands’ unsupervised data into a supervised form. On this supervised data, we trained three machine learning models (a logistic regression document classifier, a deep learning document classifier, and a deep learning sentence-pair classifier) to learn the set of rules created in the rule-based system. We used Mathews Correlation Coefficient (MCC) score for the models’ performance comparison. The logistic regression model shows an MCC score of 0.85, whereas both the Deep Learning (DL) models scored above 0.98. DL document classifier and DL sentence-pair classifier achieved an accuracy of 94.6% and 97.8% respectively on unseen data. Our proposed hybrid approach solves the complex problem of classifying semi-natural language unsupervised data. This approach can be used to create a domain-specific set of rules, and classify any unsupervised data into multi classes.

Link to online poster meeting: https://zoom.us/j/9974873100?pwd=WUNwK0w1VWtqbG04VGxmZFNmRnRyUT09

Training quantum Boltzmann machines using extreme rates of unit dropout – Ilmo Salmenperä (University of Helsinki), Jukka Nurminen (University of Helsinki) [click for abstract]

Quantum annealing is a form of quantum computing that has wide applicability in many realms, like quantum chemistry, logistics or machine learning. One of these applications is to use the quantum annealing device as a quantum sampler for sampling from the model distribution of a common machine learning model called Boltzmann Machines. This has been shown to be a quite promising way of applying quantum computing to machine learning in practice, outperforming the current classical algorithms for performing these sampling tasks.

While these devices tend to be large in comparison to universal quantum computers, they still lacking in size to be used in practical machine learning tasks. This calls for clever strategies to mitigate these issues, as most actual machine learning tasks require large layer sizes to perform well. Unit dropout method is one candidate for alleviating these issues. This model agnostic technique was originally developed for regulating weights of machine learning models, but it can also be used to reduce the effective overall size of the layers during training.

We tested the effects of extreme rates of unit dropout in the process of pretraining multiple restricted Boltzmann machines to form deep belief network and determined what sort of constraints do the results infer to quantum hardware they would be computed on. While the optimal dropout rate seems to be around 50%, which is supported by existing research, more extreme rates of dropout can give further benefits for quantum machine learning, as they allow for larger layer sizes to be used during training. Even the model with dropout rate of 92% managed to learn some representation of the underlying model distribution, which is important, as this is the model that could be feasibly computed using existing quantum annealing devices.

Link to online poster meeting: https://helsinki.zoom.us/j/66430544878

Channel Charting Based Beam SNR Prediction – Parham Kazemi (Aalto University), Tushara Ponnada (Aalto University), Hanan Al-Tous (Aalto University), Ying-Chang Liang (University of Electronic Science and Technology of China, Chengdu, P. R. China), Olav Tirkkonen (Aalto University) [click for abstract]

We consider machine learning for intra cell beam handovers in mmWave 5GNR systems by leveraging Channel Charting (CC). We develop a base station centric approach for predicting the Signal-to-Noise-Ratio (SNR) of beams. Beam SNRs are predicted based on measured signal at the BS without the need to exchange information with UEs. In an offline training phase, we construct a beam-specific dimensionality reduction of Channel State Information (CSI) to a low-dimensional CC, annotate the CC with beam-wise SNRs and then train SNR predictors for different target beams. In the online phase, we predict target beam SNRs. K-nearest neighbors, Gaussian Process Regression and Neural Network based prediction are considered. Based on SNR difference between the serving and target beams a handover can be decided. To evaluate the efficiency of the proposed framework, we perform simulations for a street segment with synthetically generated CSI. SNR prediction accuracy of average root mean square error less than 0.3 dB is achieved.

Link to online poster meeting: https://aalto.zoom.us/j/61639310559

A Relational Model for One-Shot Classification – Arturs Polis (Aalto University), Alexander Ilin (Aalto University) [click for abstract]

We show that a deep learning model with built-in relational inductive bias can bring benefits to sample-efficient learning, without relying on extensive data augmentation. The proposed one-shot classification model performs relational matching of a pair of inputs in the form of local and pairwise attention. Our approach solves perfectly the one-shot image classification Omniglot challenge. Our model exceeds human level accuracy, as well as the previous state of the art, with no data augmentation.

Link to online poster meeting: https://aalto.zoom.us/j/64246031975

EYES-project case study: In Search of Harmful Stress – Tervonen Jaakko (VTT), Närväinen Johanna (VTT), Mäntyjärvi Jani (VTT), Pettersson Kati (VTT) [click for abstract]

Human body produces different physiological stress reaction when you hit a toe to a doorstep than when you panic at a job interview. The impact for body’s homeostasis varies depending on the reaction type and some reactions are harmful to our health. Currently,stress estimation is focused on binary identification between stress and non-stress stages. More detailed separation of stress reaction types is needed for detecting harmful stress. In this study, the Extreme Gradient Boosting algorithm was used to classify a baseline condition and physiological and psychosocial stress, based on psychophysiological signals monitored using a wrist sensor device. Classification was robust in separating the two stress states from baseline and from each other. The results provide support for novel approaches utilizing fine-grained estimation of stress type from wearable sensor data.

Link to online poster meeting: https://teams.microsoft.com/l/meetup-join/19%3ameeting_NjIzNTFkOGYtNzU3YS00MDdiLTkzOTQtMGJhM2MwNzAzN2Nl%40thread.v2/0?context=%7b%22Tid%22%3a%2268d6b592-5008-43b5-9b04-23bec4e86cf7%22%2c%22Oid%22%3a%2230067a4b-b6a0-43a1-b0ab-a4585c981061%22%7d

Joint use of thermal camera and optical camera for respiration measurement – Zaeed Khan (Aalto University), Salla Aario (Aalto University), Ajinkya Gorad (Aalto University), Miika Arvonen (Kuopio University Hospital), Simo Särkkä (Aalto University) [click for abstract]

Thermal imaging has been a promising respiration measurement method in healthcare and sports technology. Nostril detection is one of the important steps in thermal imaging based respiration measurement because temperature changes in nostrils correlate with exhalation and inhalation. However, finding the nostrils from thermal images is an intricate process because thermal images do not provide detailed information about facial features. Fortunately, existing computer vision tools do facilitate the nostril detection in visible images. Hence, this work proposes a method that combines the use of thermal camera and optical camera. Optical camera detects the nostrils and thermal camera extracts respiration signal. The proposed method consists of face detection, nostril detection, and mapping of the nostrils from the visible image to the thermal image. Four pre-trained face detectors are studied in this work. The best detector was pre-trained DNN based face detector from OpenCV library. The proposed nostril detection is based on the facial landmark detection and thresholding.

Link to online poster meeting: https://aalto.zoom.us/j/66916400784

Are 3D Convolutional Networks Inherently Biased towards Appearance? – Petr Byvshev (Aalto University), Pascal Mettes (University of Amsterdam), Yu Xiao (Aalto University) [click for abstract]

3D convolutional networks, as direct inheritors of 2D convolutional networks for images, have placed their mark on action recognition in videos. Combined with pretraining on large-scale video data,high classification accuracies have been obtained on numerous video benchmarks. In an effort to better understand why 3D convolutional networks are so effective, several works have highlighted their bias towards static appearance and towards the scenes in which actions occur. In this work, we seek to find the source of this bias and question whether the observed biases towards static appearances are inherent to 3D convolutional networks or represent limited significance of motion in the training data. We resolve this by presenting temporality measures that estimate the data-to-model motion dependency at both the layer-level and the kernel-level. Moreover, we introduce two synthetic datasets where motion and appearance are decoupled by design, which allows us to directly observe their effects on the networks. Our analysis shows that 3D architectures are not inherently biased towards appearance. When trained on the most prevalent video sets, 3D convolutional networks are indeed biased throughout, especially in the final layers of the network. However, when training on data with motions and appearances explicitly decoupled and balanced, such networks adapt to varying levelsof temporality. To this end, we see the proposed measures as a reliable method to estimate motion relevance for activity classification in datasets and use them to uncover the differences between popular pretraining video collections, such as Kinetics, IG-65M and Howto100m.

Link to online poster meeting: https://aalto.zoom.us/j/66558605252

Ground based solar image restoration with unsupervised deep learning – Nigul Olspert (Aalto University), Andrés Asensio Ramos (Instituto de Astrofísica de Canarias) [click for abstract]

Major problem in ground based solar imaging is the fact that the observations suffer from aberrations caused by Earth's turbulent atmosphere. While the usage of adaptive optics can eliminate significant portion of aberrations, residual and primarily higher order aberrations have to be removed post-facto. Such methods for ground based solar image restoration have been developed throughout several decades. One of the often used techniques, namely multi-frame blind deconvolution (MFBD) with phase diversity has been shown to yield high quality restorations. However, increasing amount of data introduces high demand for real-time image restoration procedures, while the computational cost of removing such aberrations with present day state-of-the-art methods is still huge. The task of finding fast and scalable solutions for the problem is therefore critically important. In this work we introduce a novel method based on unsupervised deep learning. The main advantage of neural networks comes from the exploitation of the graphics processing units, which makes them very fast at prediction phase. Moreover, the benefit of unsupervised learning is that there is no need for restored images or ground truth data from simulations. Collecting such data itself would be very time consuming. We have trained and tested the proposed neural network on data from Microlens-fed Hyperspectral Imager in European Solar Telescope. The results show that the achieved restoration quality is comparable to that of MFBD and the method is indeed orders of magnitude faster. Therefore there is high potential for this method to be practicable in real time at the sites of instruments.

Link to online poster meeting: https://aalto.zoom.us/j/2485391999

Experiences on generating synthetic medical data with GAN models – Harri Pölönen (VTT), Niki Loppi (NVIDIA), Christian Hundt (NVIDIA) [click for abstract]

Medical data is privacy-sensitive and protected by national legislation and GDPR making data sharing between hospitals and research organizations difficult. In addition, the amount of data for a specific medical condition and imaging modality can be relatively small. Being able to generate synthetic medical data via AI in large quantities would thus be very valuable. GAN algorithms perform well in 2D, e.g. in generating human facial images, but there are very only few publications on 3D case. In this presentation, we will discuss characteristic GAN challenges to 3D medical imaging, including large memory footprint, training with limited data and mode collapse due to low variation. Furthermore, we apply the state-of-the-art methods to a publicly available magnetic resonance (MRI) dataset consisting of brain images from 1 112 healthy human subjects.

First, we employed ProgressiveGAN3D, an open-source toolkit, which implements NVIDIA's progressive growing of GANs algorithm in 3D. Although the algorithm seemingly worked well and produced visually good results, in closer inspection we noticed that algorithm suffers from mode collapse i.e. the variance in the generated images is very low. More recently, an updated version of the progressive GAN algorithm was published as part of the Nobrainer framework. Applying this, we found that the synthetic MRIs exhibit larger variation and the original mode collapse issues were at least partially resolved. We also studied the effect of augmentation on the generated results. A medical expert reviewed the generated synthetic MRIs and saw them anatomically correct. However, some details in the synthetic MRI quality allowed medical expert to distinguish them from real MRIs.

We plan to expand the approach by including another channel, such as CT volume or segmentation of target of interest, to the synthetic MRIs. Also the volume quality issues need to be solved as well as performance with smaller datasets e.g. with hundreds of subjects.

Link to online poster meeting: https://ut-capitole-fr.zoom.us/j/99087450675

Atomic force microscopy image recognition using a generative graph neural network – Lauri Kurki (Aalto University), Niko Oinonen (Aalto University), Fedor Urtev (Aalto University), Prokop Hapala (Czech Academy of Sciences), Alexander Ilin (Aalto University), Juho Kannala (Aalto University), Adam Foster (Aalto University) [click for abstract]

Atomic force microscopy (AFM) is a widely utilized characterization method capable of capturing atomic level detail in individual organic molecules. However, because an AFM image contains relatively little information about the deeper atoms in a molecule, interpretation of AFM images of non-planar molecules offers significant challenges for human experts. Recently, an approach using a convolutional neural network trained on simulated AFM images showed promising results predicting image descriptors containing information about the molecular structure of the sample from both simulated and experimental images (Alldritt et al., 2020). An end-to-end solution starting from an AFM imaging system ending in an automated image interpreter would be a valuable asset for all research utilizing AFM.

Here, we aim to build upon the previous research and improve molecular structure prediction by modeling the molecule as a graph and using a generative graph model(Li et al., 2018) to build the molecular structure atom-by-atom and bond-by-bond. The generative model utilizes a graph neural network to process the atoms and bonds in a molecule as a set of nodes and edges, and a convolutional neural network with an attention gating mechanism to process the AFM images. In the generative process, an AFM simulator is used to provide information for the model about the placement of the atom in the previous iteration. The model is trained on simulated AFM data.

Our model learns to predict simple molecules from simulated AFM images but still, further development is required to make the model more reliable with complicated sample molecules and experimental AFM images.

Link to online poster meeting: https://aalto.zoom.us/j/67954859574

Algebraic Degree of Optimization – Kaie Kubjas (Aalto University), Olga Kuznetsova (Aalto University), Luca Sodomaco (Aalto University) [click for abstract]

The algebraic degree of optimization is the number of complex restricted critical points for a general data point u and allows to study the complexity of a given problem.

We study an optimization problem with the feasible set being a real algebraic variety X and whose parametric objective function fu is gradient-solvable with respect to the parametric data u. This class of problems includes Euclidean distance optimization as well as maximum likelihood optimization. We use rational parametrization to study the algebraic degree and give some formulas using the polar classes.

Link to online poster meeting: https://aalto.zoom.us/j/61312888932

Learning to assist agents by observing them – Antti Keurulainen (Aalto University and Bitville Oy), Isak Westerlund (Bitville Oy), Samuel Kaski (Aalto University and University of Manchester), Alexander Ilin (Aalto University) [click for abstract]

The ability of an AI agent to assist other agents, such as humans, is an important and challenging goal, which requires the assisting agent to reason about the behavior and infer the goals of the assisted agent. Training such an ability by using reinforcement learning usually requires large amounts of online training, which is difficult and costly. On the other hand, offline data about the behavior of the assisted agent might be available, but is non-trivial to take advantage of by methods such as offline reinforcement learning. We introduce methods where the capability to create a representation of the behavior is first pre-trained with offline data, after which only a small amount of interaction data is needed to learn an assisting policy. We test the setting in a gridworld where the helper agent has the capability to manipulate the environment of the assisted artificial agents, and introduce three different scenarios where the assistance considerably improves the performance of the assisted agents.

Link to online poster meeting: https://zoom.us/j/7385468521?pwd=YnU2QWVuYkJTcS9aY3dmN2F3UE1EUT09

LSTM-XL: Attention Enhanced Long-term Memory for LSTM Cells – Tamás Grósz (Aalto University), Mikko Kurimo (Aalto University) [click for abstract]

Long Short-Term Memory (LSTM) cells, frequently used in state-of-the-art language models, struggle with long sequences of inputs. One major problem in their design is that they try to summarize long-term information into a single vector, which is problematic. The attention mechanism aims to alleviate this problem by accumulating the relevant outputs more efficiently. One very successful attention-based model is the Transformer, but it also has issues with long sentences. As a solution, the latest version of Transformers incorporates recurrence into the model. The success of these recurrent attention-based models inspired us to revise the LSTM cells by incorporating the attention mechanism into them. Our goal is to improve the long-term memory by extending it to store a few previous outputs and use the attention mechanism to accumulate these vectors into a long-term memory vector. This adjustment allows the cell to directly remember a few preceding outputs and focus on the relevant ones. The main advantage of our proposed approach is that it directly accesses the stored preceding vectors, making it more effective for long sentences. Using this method, we can also avoid the undesired resetting of the long-term vector by the forget gate. We evaluated our new cells as part of language models on two speech recognition tasks; a large-scale Finnish one and a low-resource Hungarian dataset. The experimental results show that this modification is beneficial; the proposed LSTM-XL managed to reach the same performance as the composite of LSTM and global attention requiring considerably less memory and computation.

Link to online poster meeting: https://aalto.zoom.us/j/69923591865

Making sense of how to create ethical AI for young children with autism – Sumita Sharma (University of Oulu), Marianne Kinnula (University of Oulu), Netta Iivari (University of Oulu), Aale Luusua (University of Oulu) [click for abstract]

As Artificial Intelligence (AI) enters our everyday lives and experiences, there are opportunities for it to truly empower and enrich the lives of many different groups of people, including children with autism and their families. However, several questions remain unanswered regarding the design, development, and deployment of AI systems to not only mimic human decision making, but also be ethical in their approach, trustworthy in their output, and in having support mechanisms for human agency and oversight. We developed an interview framework to explore these issues in the context of developing an AI system for children with autism, and piloted the framework with eight interviewees who participate in developing an AI system for the screening and therapies for children with autism in India. We then interviewed eleven parents of young children with autism who use the system. The study was conducted remotely during the pandemic in July-Aug 2020 and Jan-Feb 2021. Our work paves the way for researchers to explore, question, and probe into the nuances of designing ethical AI for Autism. The understanding derived from this study can be applied to other contexts as well, where empowerment, human agency and oversight in AI systems are in focus.

Link to online poster meeting: https://oulu.zoom.us/j/8030505872

Human-Centered AI Design: A Study of Company Practices – Maria Hartikainen (Tampere University), Saara Ala-Luopa (Tampere University), Anu Lehtiö (Tampere University), Thomas Olsson (Tampere University), Kaisa Väänänen (Tampere University) [click for abstract]

Human-centered AI (HCAI) advocates the development of AI applications that are trustworthy, user-friendly, and socially sustainable. While the conceptual foundations and principles of HCAI are extensively discussed in recent literature, as a practical methodology it appears to lag behind in software companies. To advance HCAI method development, a key starting point is to understand the current practices of companies developing AI applications. We conducted an interview study of practitioners from twelve AI development companies in Finland. We aimed to understand the current situation of how AI applications are being designed as well as how principles of HCAI manifest in the methodological approaches in the early phases of AI application development. Our thematic analysis paints a hars picture about present-day AI application design: it appears that established human-centered methods are largely disregarded due to the focus on the quality of training data and demonstrating AI’s capabilities. The companies in this sample tend to use flexible adaptations of methods and practises familiar from non-AI application design. End-users are rarely involved and the decisions on end-user needs are made by AI engineers and the client. Consequently, we call for more concrete HCAI methods for putting the HCAI principles into practice at different phases of the AI application development process.

Link to online poster meeting: https://tuni.zoom.us/j/63487494105

Comparing seven methods for state-of-health time series prediction for the lithium-ion battery packs of forklifts – Matti Huotari (Department of Computer Science, Aalto University, Espoo, Finland), Shashank Arora (Department of Mechanical Engineering, Aalto University, Espoo, Finland), Avleen Malhi (Department of Computing and Informatics, Bournemouth University, Bournemouth, UK), Kary Främling (Department of Computer Science, Umeå University, Umeå, Sweden), [click for abstract]

A key aspect for the forklifts is the state-of-health (SoH) assessment to ensure the safety and the reliability of uninterrupted power source. Forecasting the battery SoH well is imperative to enable preventive maintenance and hence to reduce the costs. This paper demonstrates the capabilities of gradient boosting regression for predicting the SoH timeseries under circumstances when there is little prior information available about the batteries. We compared the gradient boosting method with light gradient boosting, extra trees, extreme gradient boosting, random forests, long short-term memory networks and with combined convolutional neural network and long short-term memory networks methods. We used multiple predictors and lagged target signal decomposition results as additional predictors and compared the yielded prediction results with different sets of predictors for each method. For this work, we are in possession of a unique data set of 45 lithium-ion battery packs with large variation in the data. The best model that we derived was validated by a novel walk-forward algorithm that also calculates point-wise confidence intervals for the predictions; we yielded reasonable predictions and confidence intervals for the predictions. Furthermore, we verified this model against five other lithium-ion battery packs; the best model generalised to greater extent to this set of battery packs. The results about the final model suggest that we were able to enhance the results in respect to previously developed models. Moreover, we further validated the model for extracting cycle counts presented in our previous work with data from new forklifts; their battery packs completed around 3000 cycles in a 10-year service period, which corresponds to the cycle life for commercial Nickel–Cobalt–Manganese (NMC) cells.

Link to online poster meeting: https://aalto.zoom.us/j/63552283435

The ELLIS Value Proposition for Your Career or Business – Katarina Sladakovic (ELLIS Unit Helsinki)

ELLIS – European Lab for Learning and Intelligent Systems – brings together the very best machine learning research in Europe, focusing on fundamental science, technical innovation, and societal impact. The ELLIS mission is to create a diverse, European network that promotes research excellence and advances breakthroughs in AI, while working closely with industry.

Did you know Finland is one of 14 countries in Europe to host an ELLIS Unit?

Founded in 2020 by Aalto University and University of Helsinki, and hosted by Finnish Center for Artificial Intelligence FCAI, ELLIS Unit Helsinki works to create a new type of AI, able to operate with humans in the complex world - and to renew industry.

ELLIS and ELLIS Unit Helsinki offer outstanding opportunities for senior researchers, PhD and postdoc students and companies alike:

→ For senior researchers: Opportunities to carry out excellent research in Europe and collaborate with the top ML and AI peers in Europe, participate in Mobility program, receive a Mobility grant and benefit from resources ELLIS offers to its members

→ For PhD students: Kick-off your scientific career in ML and AI through ELLIS PhD and postdoc program - two top-level senior researchers from different countries as supervisors, choose between academic or industry track, participate in an exchange program, receive a mobility grant and get access to world-class training and networking opportunities!

→ For companies/industry: Average research gets you average results. Excellent research gives you competitive advantage on a European and global scale. Explore funding opportunities for SMEs and start-ups and partnership opportunities to unlock vast potential for your business!

Join our poster session to learn more!

Link to online poster meeting: https://aalto.zoom.us/j/69551288772

Back to the AI Day main webpage