Scientific Presentations

AI Day 2022

Scientific talks – Track 1 (Kaleva hall)
Session 1 (9:30–10:30)
Session 2 (11:00–12:00)
Session 3 (13:00–14:30)

Scientific talks – Track 2 (Lumituuli hall)
Session 4 (9:30–10:30)
Session 5 (11:00–11:30)

Scientific posters (Capitolium and Sief halls)
Poster session (hosted 14:30–15:30)

Scientific talks

Session 1 (Kaleva hall, 9:30-10:30)

Self-supervised 2D face presentation attack detection via temporal sequence sampling - Usman Muhammad, University of Oulu; Zitong Yu, University of Oulu; Jukka Komulainen, University of Oulu [click for abstract]

Conventional 2D face biometric systems are vulnerable to presentation attacks performed with different face artefacts, e.g., printouts, video-replays and wearable 3D masks. The research focus in face presentation attack detection (PAD) has been recently shifting towards end-to-end learning of deep representations directly from annotated data rather than designing hand-crafted (low-level) features. However, even the state-of-the-art deep learning based face PAD models have shown unsatisfying generalization performance when facing unknown attacks or acquisition conditions due to lack of representative training and tuning data available in the existing public benchmarks. To alleviate this issue, we propose a video pre-processing technique called Temporal Sequence Sampling (TSS) for 2D face PAD by removing the estimated inter-frame 2D affine motion in the view and encoding the appearance and dynamics of the resulting smoothed video sequence into a single RGB image. Furthermore, we leverage the features of a Convolutional Neural Network (CNN) by introducing a self-supervised representation learning scheme, where the labels are automatically generated by the TSS method as the stabilized frames accumulated over video clips of different temporal lengths provide the supervision. The learnt feature representations are then fine-tuned for the downstream task using labelled face PAD data. Our extensive experiments on four public benchmarks, namely Replay-Attack, MSU-MFSD, CASIA-FASD and OULU-NPU, demonstrate that the proposed framework provides promising generalization capability and encourage further study in this domain.

Brain-Supervised Image Editing - Davis, Keith, University of Helsinki; de la Torre-Ortiz, Carlos, University of Helsinki; Ruotsalo, Tuukka, University of Helsinki and University of Copenhagen [click for abstract]

Despite recent advances in deep neural models for semantic image editing, present approaches are dependent on explicit human input. Previous work assumes the availability of manually curated datasets for supervised learning, while for unsupervised approaches the human inspection of discovered components is required to identify those which modify worthwhile semantic features. Here, we present a novel alternative: the utilization of brain responses as a supervision signal for learning semantic feature representations. Participants (N=30) in a neurophysiological experiment were shown artificially generated faces and instructed to look for a particular semantic feature, such as “old” or “smiling”, while their brain responses were recorded via electroencephalography (EEG). Using supervision signals inferred from these responses, semantic features within the latent space of a generative adversarial network (GAN) were learned and then used to edit semantic features of new images. We show that implicit brain supervision achieves comparable semantic image editing performance to explicit manual labeling. This work demonstrates the feasibility of utilizing implicit human reactions recorded via brain-computer interfaces for semantic image editing and interpretation.

A Closer Look at Parameter Contributions When Training Neural Language and Translation Models - Vázquez, Raúl (University of Helsinki); Celikkanat, Hande (University of Helsinki); Ravishankar, Vinit (University of Oslo); Creutz, Mathias (University of Helsinki); Tiedemann, Jörg (University of Helsinki) [click for abstract]

We analyze the learning dynamics of neural language and translation models using Loss Change Allocation (LCA), an indicator that enables a fine-grained analysis of parameter updates when optimizing for the loss function. In other words, we can observe the contributions of different network components at training time. In this article, we systematically study masked language modeling, causal language modeling, and machine translation. We show that the choice of training objective leads to distinctive optimization procedures, even when performed on comparable Transformer architectures. We demonstrate how the various Transformer parameters are used during training, supporting that the feed-forward components of each layer are the main contributors to the optimization procedure. Finally, we find that the learning dynamics are not affected by data size and distribution but rather determined by the learning objective.

Artificial Intelligence for Acoustic Levitation - Iablonskyi, Denys (University of Helsinki); Kinnunen, Anniina (University of Helsinki); Salmi, Ari (University of Helsinki); Klami, Arto (University of Helsinki) [click for abstract]

Acoustic sound waves create acoustic sound pressure that can exert forces on the objects and suspend them against gravity in mid-air, enabling acoustic levitation. Contactless manipulation (unrestricted movement in 3D) of samples in mid-air opens new applications in various fields of academic research and industry but remains technically challenging. Practical levitators use typically tens or hundreds of small (~1cm) ultrasonic transducers arranged densely into different geometries, and the acoustic pressure is controlled by the amplitudes and phases of each element. While the settings needed for static levitation of mm-sized objects are easy to find using direct physical models, there are currently no practical solutions for e.g. picking up an object from the surface or stable transportation of objects in the levitation field. We use artificial intelligence to control the acoustic levitation process and to solve challenging control tasks beyond reach with direct physical modeling. We have developed a simulated levitator environment that models the dynamics of the levitating object and allows training of control policies for desired tasks, either in form of direct planning with known forward dynamics or by solving a reinforcement learning task. This allows e.g. 3D manipulation that only requires the user to specify an initial and target location and a suitable reward function. In addition to acoustic levitation, the pressure field can be optimized to generate 3D acoustic holograms or tactile displays. Like the control problem, the transducer settings required for this cannot be determined from physical principles alone, but instead, require learning. We also present an efficient hologram optimizer that solves for transducer's activation function to generate complex pressure fields that were verified experimentally on the physical device.

Session 2 (Kaleva hall, 11:00-12:00)

Multi-scale 3D Shift Graph Convolution Network for Emotion Recognition from Human Actions - SHI, Henglin (University of Oulu, Center for Machine Vision and Signal Analysis);PENG, Wei (Stanford University, Computational Neuroimage Science Laboratory); CHEN, Haoyu (University of Oulu, Center for Machine Vision and Signal Analysis);

LIU, Xin (Lappeenranta-Lahti University of Technology LUT); ZHAO, Guoying (University of Oulu, Center for Machine Vision and Signal Analysis) [click for abstract]

Recognizing emotions from body gestures could be more challenging than action and gesture recognition because emotions expressed by body gestures usually do not have certain spatial configurations of joints. For example a normal ’walking’ action is always associated with hands and feet movements. In the contrary, expressing a type of emotion is not limited to moving only a fixed set of joints at specific spatial locations, it could involve the interaction among arbitrary joints through space and time. Thus, recognizing emotions from gestures requires modelling spatial-temporal patterns from a more global level. However, the most recent powerful graph convolution networks (GCNs) separate the spatial and temporal modelling into isolated processes, where GCN models spatial interactions using partially fixed adjacent matrices and 1-D convolution captures temporal dynamics, which is insufficient for emotion recognition. In this work, we propose the 3D-Shift GCN which enables interactions of joints within a spatial-temporal volume for global feature extraction. Besides, we also propose a multi-scale architecture fusing features captured under different temporal ranges for modelling richer dynamics. The proposed 3D-Shift GCN and multi-scale architecture are evaluated by two regular action recognition benchmarks and two gesture-based emotion recognition datasets. Experimental results show that the proposed 3D-Shift GCN outperforms several state-of-the-art methods, and the multi-scale architecture can further improve the performance of 3D-Shift GCN.

Node co-activations as a means of error detection—Towards fault-tolerant neural networks - Myllyaho, Lalli (University of Helsinki); Nurminen, Jukka K. (University of Helsinki); Mikkonen, Tommi (University of Jyväskylä) [click for abstract]

Context: Machine learning has proved an efficient tool, but the systems need tools to mitigate risks during runtime. One approach is fault tolerance: detecting and handling errors before they cause harm.

Objective: This paper investigates whether rare co-activations – pairs of usually segregated nodes activating together – are indicative of problems in neural networks (NN). These could be used to detect concept drift and flagging untrustworthy predictions.

Method: We trained four NNs. For each, we studied how often each pair of nodes activates together. In a separate test set, we counted how many rare co-activations occurred with each input, and grouped the inputs based on whether its classification was correct, incorrect, or whether its class was absent during training.

Results: Rare co-activations are much more common in inputs from a class that was absent during training. Incorrectly classified inputs averaged a larger number of rare co-activations than correctly classified inputs, but the difference was smaller.

Conclusions: As rare co-activations are more common in unprecedented inputs, they show potential for detecting concept drift. There is also some potential in detecting single inputs from untrained classes. The small difference between correctly and incorrectly predicted inputs is less promising and needs further research.

A Novel Method for Function Smoothness in Neural Networks - Lindqvist, Blerta (Aalto University) [click for abstract]

Existing methods for function smoothness in neural networks have limitations. These methods can make training sensitive to their hyperparameters, or their smoothness constraints can limit model capacity. These methods can impose too much smoothness, even in areas without data, or they can impose non-meaningful smoothness constraints. The way these methods measure smoothness can also be computationally hard. One of the main methods for function smoothness, Lipschitz continuity, does not even imply differentiability theoretically, let alone continuous differentiability, that is smoothness. In this paper, we propose a method based on the theoretical definition of the derivative to ensure that the derivative of the parametrized function should tend toward its theoretical value for the given neural network parameters in the vicinity of training samples. The method changes the classifier and its training minimally and has no added hyperparameters. The proposed method is shown to achieve a smoother function in the vicinity of both training and testing samples for all tested datasets, as measured with decreased values of the Frobenius norm of the Jacobian with respect to inputs. Due to the correlation between function smoothness and generalization, the method makes classifiers generalize better and achieve higher accuracy than default classifiers for Restricted ImageNet, CIFAR10 and MNIST. Due to the correlation between function smoothness and adversarial robustness, the proposed method makes classifiers with high-capacity architecture more robust to adversarial samples generated with the PGD attack compared to default classifiers for the Restricted ImageNet, CIFAR10, Fashion-MNIST and MNIST datasets.

Temporal Disentanglement of Representations for Improved Generalisation in Reinforcement Learning - Dunion, Mhairi (University of Edinburgh); McInroe, Trevor (University of Edinburgh); Luck, Kevin Sebastian (Aalto University); Hanna, Josiah (University of Wisconsin-Madison); Albrecht, Stefano V. (University of Edinburgh) [click for abstract]

Reinforcement Learning (RL) agents are often unable to generalise well to environment variations in the state space that were not observed during training. This issue is especially problematic for image-based RL, where a change in just one variable, such as the background colour, can change many pixels in the image, which can lead to drastic changes in the agent's latent representation of the image, causing the learned policy to fail. To learn more robust representations, we introduce TEmporal Disentanglement (TED), a self-supervised auxiliary task that leads to disentangled image representations exploiting the sequential nature of RL observations. We find empirically that RL algorithms utilising TED as an auxiliary task adapt more quickly to changes in environment variables with continued training compared to state-of-the-art representation learning methods. Since TED enforces a disentangled structure of the representation, we also find that policies trained with TED generalise better to unseen values of variables irrelevant to the task (e.g. background colour) as well as unseen values of variables that affect the optimal policy (e.g. goal positions).

Session 3 (Kaleva hall, 13:00-14:30)

Generative Modelling with Inverse Heat Dissipation - Rissanen, Severi (Aalto University), Heinonen, Markus (Aalto University), Solin, Arno (Aalto University) [click for abstract]

While diffusion models have shown great success in image generation, their noise-inverting generative process does not explicitly consider the structure of images, such as their inherent multi-scale nature. Inspired by diffusion models and the empirical success of coarse-to-fine modelling, we propose a new model that generates images through iteratively inverting the heat equation, a PDE that locally erases fine-scale information when run over the 2D plane of the image. We interpret a noise-relaxed solution of the forward heat equation as a variational approximation in a diffusion-like latent variable model. Our new model shows emergent qualitative properties not seen in standard diffusion models, such as disentanglement of overall colour and shape in images and data efficiency. Spectral analysis on natural images highlights connections to diffusion models and reveals implicit inductive biases in them.

A Hardware Perspective to Evaluating Probabilistic Circuits - Jelin Leslin (Aalto University); Antti Hyttinen (University of Helsinki); Karthekeyan Periasamy (Aalto University); Lingyun Yao (Aalto University); Martin Trapp (Aalto University); Martin Andraud (Aalto University) [click for abstract]

The always-increasing development of AI-enhanced Internet-of-Things devices has recently pushed the need for on-device computation of AI models. As these tasks require making robust predictions under uncertainty, probabilistic (graphical) models have recently gained interest also for these applications. However, embedded computation requires high computational efficiency (\ie, high speed and low power) through hardware acceleration. Although the acceleration of deep learning models has shown extensive benefits, this has not translated to probabilistic models as of yet. Probabilistic circuits (PCs), a family of tractable probabilistic models, allow a direct hardware view as they are represented in the form of a computational graph. Over the years, various approaches for structure learning of PCs have been proposed, however, without consideration of their potential hardware cost. In this work, we propose to take a hardware perspective in the evaluation of PC structures. We compare several structure learning strategies, associating each PC with hardware costs (computation power, speed, efficiency), and evaluate which one leads to more hardware-friendly implementations. Our results show that models imposing additional structural constraints on the PC are competitive models in terms of performance while being generally more hardware-efficient, making them suitable candidates for energy-constrained applications.

Clustering with Fair-Center Representation: Parameterized Approximation Algorithms and Heuristics - Thejaswi, Suhas (Aalto University); Gadekar, Ameet (Aalto University); Ordozgoiti, Bruno (Queen Mary University of London); Osadnik, Michał (Aalto University) [click for abstract]

We study a variant of classical clustering formulations in the context of algorithmic fairness, known as diversity-aware clustering. In this variant, we are given a collection of facility subsets and a solution must contain at least a specified number of facilities from each subset while simultaneously minimizing the clustering objective (k-median or k-means). We investigate the fixed-parameter tractability of these problems and show several negative hardness and inapproximability results, even when we afford exponential running time with respect to some parameters.

Motivated by these results we identify natural parameters of the problem, and present fixed-parameter approximation algorithms with approximation ratios (1 + 2/e + \eps) and (1 + 8/e + \eps) for diversity-aware k-median and diversity-aware k-means respectively, and argue that these ratios are essentially tight assuming the gap-exponential time hypothesis. We also present a simple and more practical bicriteria approximation algorithm with better running time bounds. We finally propose efficient and practical heuristics. We evaluate the scalability and effectiveness of our methods in a wide variety of rigorously conducted experiments, on both real and synthetic data.

Robust Multi-fidelity Bayesian Optimization - Mikkola, Petrus (Aalto University); Martinelli, Julien (Aalto University); Filstroff, Louis (Aalto University, ENSAI, CREST); Kaski, Samuel (Aalto University, University of Manchester) [click for abstract]

Bayesian optimization (BO) is a powerful framework for optimizing black-box, expensive-to-evaluate functions. Over the past decade, many algorithms have been proposed to integrate cheaper, lower-fidelity approximations of the objective function into the optimization process, with the goal of converging towards the global optimum at a reduced cost. This task is generally referred to as multi-fidelity Bayesian optimization (MFBO). However, MFBO algorithms can lead to higher optimization costs than their vanilla BO counterparts, especially when the low-fidelity sources are poor approximations of the objective function, therefore defeating their purpose. To address this issue, we propose rMFBO (robust MFBO), a methodology to make any GP-based MFBO scheme robust to the addition of unreliable information sources. rMFBO comes with a theoretical guarantee that its performance can be bound to its vanilla BO analog, with high controllable probability. We demonstrate the effectiveness of the proposed methodology on a number of numerical benchmarks, outperforming earlier MFBO methods on the most unreliable sources. We expect rMFBO to be particularly useful for including the varying expertise of human experts in Bayesian optimization processes in a reliable manner.

Generalized vec trick for fast learning of pairwise kernel models - Viljanen, Markus (University of Turku); Airola, Antti (University of Turku); Pahikkala, Tapio (University of Turku) [click for abstract]

Pairwise learning corresponds to the supervised learning setting where the goal is to make predictions for pairs of objects. Prominent applications include predicting drug-target or protein-protein interactions, or customer-product preferences. In this work, we present a comprehensive review of pairwise kernels, that have been proposed for incorporating prior knowledge about the relationship between the objects. Specifically, we consider the standard, symmetric and anti-symmetric Kronecker product kernels, metric-learning, Cartesian, ranking, as well as linear, polynomial and Gaussian kernels. Recently, a O(nm+nq) time generalized vec trick algorithm, where n, m, and q denote the number of pairs, drugs and targets, was introduced for training kernel methods with the Kronecker product kernel. This was a significant improvement over previous O(n^2) training methods, since in most real-world applications m,q<<n. data-preserve-html-node="true" In this work we show how all the reviewed kernels can be expressed as sums of Kronecker products, allowing the use of generalized vec trick for speeding up their computation. In the experiments, we demonstrate how the introduced approach allows scaling pairwise kernels to much larger data sets than previously feasible, and provide an extensive comparison of the kernels on a number of biological interaction prediction tasks.

Compositional Generalization in Grounded Language Learning via Induced Model Sparsity - Spilsbury, Sam (Aalto University); Ilin, Alexander (Aalto University) [click for abstract]

We provide a study of how induced model spar- sity can help achieve compositional generaliza- tion and better sample efficiency in grounded language learning problems. We consider sim- ple language-conditioned navigation problems in a grid world environment with disentangled observations. We show that standard neural ar- chitectures do not always yield compositional generalization. To address this, we design an agent that contains a goal identification mod- ule that encourages sparse correlations between words in the instruction and attributes of ob- jects, composing them together to find the goal.1 The output of the goal identification module is the input to a value iteration network planner. Our agent maintains a high level of performance on goals containing novel combi- nations of properties even when learning from a handful of demonstrations. We examine the internal representations of our agent and find the correct correspondences between words in its dictionary and attributes in the environment.

Session 4 (Lumituuli hall, 9:30-10:30)

Self-Calibrating Anomaly and Change Detection for Autonomous Inspection Robots - Sahar, Salimpour Kasebi (University of Turku); Jorge, Peña Queralta (University of Turku); Tomi, Westerlund (University of Turku); [click for abstract]

Automatic detection of visual anomalies and changes in the environment has been a topic of recurrent attention in the fields of machine learning and computer vision over the past decades. A visual anomaly or change detection algorithm identifies regions of an image that differ from a reference image or dataset. The majority of existing approaches focus on anomaly or fault detection in a specific class of images or environments, while general-purpose visual anomaly detection algorithms are more scarce in the literature. In this paper, we propose a comprehensive deep learning framework for detecting anomalies and changes in a priori unknown environments after a reference dataset is gathered, and without need for retraining the model. We use the SuperPoint and SuperGlue feature extraction and matching methods to detect anomalies based on reference images taken from a similar location and with partial overlapping of the field of view. We also introduce a self-calibrating method for the proposed model in order to address the problem of sensitivity to feature matching thresholds and environmental conditions. To evaluate the proposed framework, we have used a ground robot system for the purpose of reference and query data collection. We show that high accuracy can be obtained using the proposed method. We also show that the calibration process enhances changes and foreign object detection performance.

Co-Imitation: Learning Design and Behaviour by Imitation - Rajani, Chang (Helsinki University); Arndt, Karol (Aalto University); Blanco-Mulero, David (Aalto University); Luck, Kevin Sebastian (Aalto University & Finnish Center for AI); Kyrki, Ville (Aalto University) [click for abstract]

The co-adaptation of robots has been a long-standing research endeavour with the goal of adapting both body and behaviour of a system for a given task, inspired by the natural evolution of animals. Co-adaptation has the potential to eliminate costly manual hardware engineering as well as improve the performance of systems. The standard approach to co-adaptation is to use a reward function for optimizing behaviour and morphology. However, defining and constructing such reward functions is notoriously difficult and often a significant engineering effort. This paper introduces a new viewpoint on the co-adaptation problem, which we call co-imitation: finding a morphology and a policy that allow an imitator to closely match the behaviour of a demonstrator. To this end we propose a co-imitation methodology for adapting behaviour and morphology by matching state distributions of the demonstrator. Specifically, we focus on the challenging scenario with mismatched state- and action-spaces between both agents. We find that co-imitation increases behaviour similarity across a variety of tasks and settings, and demonstrate co-imitation by transferring human walking, jogging and kicking skills onto a simulated humanoid.

Enhancing old music recordings using deep learning - Moliner, Eloi (Aalto Acoustics Lab) ; Välimäki, Vesa (Aalto Acoustics Lab) [click for abstract]

Enhancing the sound quality of historical music recordings is a long-standing problem, as they are widely affected by several kinds of degradation, such as background noise or a limited bandwidth. We present our two recent publications on automatic restoration of gramophone recordings using deep learning techniques. The proposed approach is designed as two separate steps. First, a convolutional neural network architecture is applied to jointly suppress any additive disturbances, such as clicks and background noise. Then, the bandwidth of the denoised recording is extended using a generative adversarial network. The models from both steps work with the complex spectrogram representation of audio, and are trained with synthetically generated data. Thanks to a carefully built dataset of realistic noises, and a dedicated regularization strategy, the trained models can effectively generalize and enhance the quality of real historical recordings. Both methods are evaluated with objective and subjective metrics, outperforming the compared baselines. This study shows the importance of realistic training data and the power of deep learning in audio restoration.

The Placebo Effect of Artificial Intelligence in Human-Computer Interaction - Kosch, Thomas (HU Berlin); Welsch, Robin (Aalto University); Chuang, Lewis (TU Chemnitz); Schmidt, Albrecht (LMU Munich) *shared first-authorship [click for abstract]

In medicine, patients can obtain real benefits from a sham treatment. These benefits are known as the placebo effect. We report two experiments (Experiment I: N=369; Experiment II: N=100) demonstrating a placebo effect in adaptive interfaces. Participants were asked to solve word puzzles while being supported by no system or an adaptive AI interface. All participants experienced the same word puzzle difficulty and had no support from an AI throughout the experiments. Our results showed that the belief of receiving adaptive AI support increases expectations regarding the participant’s own task performance, sustained after interaction. These expectations were positively correlated to performance, as indicated by the number of solved word puzzles. We integrate our findings into technological acceptance theories and discuss implications for the future assessment of AI-based user interfaces and novel technologies. We argue that system descriptions can elicit placebo effects through user expectations biasing the results of user-centered studies.

Session 5 (Lumituuli hall, 11:00-11:30)

Safety by simulation - Viljanen, Mika [click for abstract]

Mobility robots will soon be among us, triggering a need to regulate their safety. Robot safety regulation, however, remains underexplored, with only few papers analyzing what regulatory approaches could be feasible. This article offers an account of the available regulatory strategies. It first discusses the distinctive features of mobility robots as regulatory targets and argues that robot system complexity is they key regulatory concern as it renders robots practically non-deteministic. Second, the article reviews rules-based and performance-based regulation and argues that both will fail to govern robot complexity. Controlling complex robots will require deploying a simulation-based regulatory approach. Simulation-based regulation is a novelty with significant theoretical and practical implications. The article argues simulation-based regulation signifies a radical break in regulatory forms of knowledge and temporalities. It enacts futures to create a new regulatory knowledge type. On the practical level, the safety new knowledge type may destabilize the existing conceptual space of safety politics and liability allocation patterns.

ECCOLA — A method for implementing ethically aligned AI systems - Vakkuri, Ville (University of Vaasa); Kemell, Kai-Kristian (University of Helsinki) [click for abstract]

The growing impact of Artificial Intelligence (AI) systems has highlighted potential issues that may arise from their utilization, such as data privacy issues, resulting in calls for ethical AI systems. Yet, how to develop ethical AI systems remains an important question in the area. How should the various AI ethics principles be converted into requirements for these systems, and what should developers and the organizations developing these systems do to implement them into practice? To further bridge this gap in the area, we present the newest, updated (2022) version of our method for implementing AI Ethics: ECCOLA. ECCOLA was originally published in Euromicro DSD/SEAA 2020 as a conference paper and then extended into a journal publication in the Journal of Systems & Software in 2021.

Scientific posters

Poster session (Capitolium and Sief halls)

Hosted at 14:30–15:30

Governance in Ethical and Trustworthy AI Systems - Agbese, Mamia (The university of Jyväskylä); Jani, Antikainen (The university of Jyväskylä); Halme, Erika (The university of Jyväskylä); Hannakaisa, Isomäki (The university of Jyväskylä); Marianna, Jantunen (The university of Jyväskylä); Kai-Kristian, Kemell (The university of Jyväskylä); Rebekah, Rousi (The university of Jyväskylä);Heidi, Vainio-Pekka (The university of Jyväskylä); Ville, Vakkuri (The university of Jyväskylä) [click for abstract]

The continuous development of artificial intelligence (AI) and increasing rate of adoption by software startups calls for governance measures to be implemented at the design and development stages to help mitigate AI governance concerns. Most AI ethical design and development tools rely on AI ethics principles as the primary governance and regulatory instrument for developing ethical AI that informs AI governance. However, AI ethics principles have been identified as insufficient for AI governance due to a lack of information robustness, requiring the need for additional governance measures. Our study explores Adaptive governance, which combines established governance practices with AI ethics principles in AI ethical frameworks for improved information and subsequent AI governance.

SLISEMAP: Supervised dimensionality reduction through local explanations - Björklund, Anton (University of Helsinki); Mäkelä, Jarmo (University of Helsinki); Puolamäki, Kai (University of Helsinki); [click for abstract]

Existing methods for explaining black box learning models often focus on building local explanations of model behaviour for a particular data item. It is possible to create global explanations for all data items, but these explanations generally have low fidelity for complex black box models. We propose a new supervised manifold visualisation method, SLISEMAP, that simultaneously finds local explanations for all data items and builds a (typically) two-dimensional global visualisation of the black box model such that data items with similar local explanations are projected nearby. We provide a mathematical derivation of our problem and an open source implementation implemented using the GPU-optimised PyTorch library. We compare SLISEMAP to multiple popular dimensionality reduction methods and find that SLISEMAP is able to utilise labelled data to create embeddings with consistent local white box models. We also compare SLISEMAP to other model-agnostic local explanation methods and show that SLISEMAP provides comparable explanations and that the visualisations can give a broader understanding of black box regression and classification models.

Clinical microbiome data science - Borman Tuomas (University of Turku, Department of Computing); Lahti Leo (University of Turku, Department of Computing) [click for abstract]

Because of the complex and high dimensional nature of microbiome profiling data, machine learning and other computational approaches have become an instrumental part of the researcher’s toolkit in this field. There is an increasing need to develop robust and reproducible methods to integrate and analyse taxonomic, functional, and clinical data across multiple sources, such as microbial abundances in the gut with biomolecular profiling data from blood samples. This kind of integrative multi-omic approaches can support the analysis of microbiome dysbiosis and facilitate the discovery of novel biomarkers for health and disease. The currently available solutions for multi-assay microbiome data have severe limitations in terms of scalability and the ability to incorporate different types of complementary data sources in a single reproducible workflow. Emerging analysis ecosystem called miaverse (Microbiome analysis universe) utilizes a common, standardized data container, which enables highly optimized integration of multiassay microbiome profiling data from clinical studies. The miaverse is a collaborative opensource project. We have developed open data science methods for data analysis and visualization, with a particularly designed support for multi-assay data integration and analysis. Stable versions are available via the peer-reviewed Bioconductor network for open research software, together with comprehensive online documentation. The emerging framework fulfills the need for open and reproducible analysis of multiassay data in microbiome studies. We anticipate that the framework has the potential to be widely adopted by microbiome researchers. This is further facilitated by the tight links to related application domains, such as single cell sequencing, where closely related opensource techniques are now being developed.

Why join the Cloud AI navy when you can be a TinyML pirate? - Doyu, Hiroshi (Ninjalabo); Morabito, Roberto (Helsinki university) [click for abstract]

The pervasive use of Machine Learning (ML) algorithms, as well as of the enhanced AI processing-enabled software and hardware components led us to think that the traditional cloud-based AI lifecycle can nearly always be the right recipe for ensuring a seamless and efficient AI processing. But is this really the case? The strong dependency on the use of cloud and specialized processing technologies (e.g., GPU) of the modern ML toolchains naturally limits the ML reachability and usability. Edge AI has tried to reduce the dependency from these technologies, by enabling more viable, user-friendly, and out-of-the box AI processing deployments. However, the dependency on connectivity and the lack of 'plug and play' interoperability among different AI-enabled edge devices, negatively counterbalance the benefits generated. In this work, we aim to drive through the practical implications of both cloud- and edge- based AI limitations, by showcasing how TinyML can contribute to tackle such challenges. TinyML represents an emerging paradigm, which relates to the possibility of running ML inference tasks on ultra-low-power microcontrollers. Although TinyML bears all the attributes for being considered one the most promising ML technology for ensuring reliable and energy-efficient AI capabilities on hyperscale, various challenges still limit the effective development of TinyML in the IoT world and in the development of its ecosystem. Our TinyML-as-a-Service (TinyMLaaS) paradigm aims to fill the gap in this respect by enabling a sort of 'democratization' of TinyML, which is reflected in easier embedded AI systems development. The TinyMLaaS goal is to make the execution of ML inference tasks straightforward in a wide class of devices, possibly characterized by constrained hardware and software resources. We present how we intend to bind the “as-a-service” model to TinyML and provide a technical overview of requirements and building blocks characterizing this emerging paradigm.

On Feature Importance Versus Feature Influence and Why It Matters for Explainable AI - Främling, Kary (Umeå University; Aalto University) [click for abstract]

Papers on explainable AI (XAI) tend to use concepts such as importance, influence significance, relevance etc interchangeably, as if they would all signify the same. This paper suggests that there is a difference between these concepts and notably between importance and influence. Mathematical definitions are provided for importance and influence based on existing theory, which show that these two concepts have different meanings when used in explanations. Making this distinction is crucial for understanding the produced explanations correctly. We further study how this difference can be seen in the Contextual Importance and Utility (CIU) method and how it relates to the family of additive feature attribution methods. Empirical results from two simple use cases with a known model and the Titanic data set are used for illustrating the difference and for assessing how the methods perform according to different assessment criteria.

Developing an Automatic Speaking Assessment System for L2 Speech - Getman, Yaroslav (Aalto University); Al-Ghezi, Ragheb (Aalto University); Voskoboinik, Ekaterina (Aalto University); Akiki, Clara (Aalto University); von Zansen, Anna (University of Helsinki); Hildén, Raili (University of Helsinki); Huhta, Ari (University of Jyväskylä); Kallio, Heini (University of Jyväskylä); Kuronen, Mikko (University of Jyväskylä); Kurimo, Mikko (Aalto University) [click for abstract]

Our presentation describes the main steps we took to implement an automatic speaking assessment system for second language learners in the DigiTala project. So far, we have developed systems for Finnish and Finland Swedish. We have also implemented an online system using a Moodle plugin that will be shown in the conference. The poster describes the steps taken to develop the system: -Designing suitable speaking tasks for data collection and speaking assessment for the target groups. Several test versions were designed to match various skill levels of the students. -Collecting training data for Automatic speech recognition (ASR) and Evaluators. A Moodle system was developed and utilized in the collection of the Finnish data from high school and Aalto University students. -Designing the transcription guidelines and transcribing the speech data. The transcribers were also instructed about special tags for foreign and almost correct words, non-speech and disfluencies. -Designing the rating scales and manually rating the student responses with the teachers’ Moodle system developed for the project. The recordings were distributed among the raters with a partial overlap for monitoring the inter-rater reliability and computing fair average scores. -Training and testing the ASR and Evaluators. The ASR system was implemented using pre-trained self-supervised models. Each criterion involved a separate machine learning model trained with the fair average scores. -Implementing an ASR-based automatic speaking assessment server back-end that is able to return the ASR transcript, analytic and holistic scores for the test-takers in real-time. -Implementing a demonstration system front-end with a Moodle-plugin interface for ASR and Evaluator API. This serves as the front-end for the test-takers to use the back-end speaking assessment server. -Collecting feedback from the test-takers and teachers. To collect the feedback we use online questionnaires and face-to-face interviews

Artificial Intelligence Lifecycle: Allocating Accountability and Compliance Responsibilities - Gonzalez Torres, Ana Paula (Aalto University) [click for abstract]

A study on the allocation of compliance responsibilities in light of the opaque organisational structures involved in the lifecycle of an AI project and the current unclear legal framework. Under the proposed ‘AI Act’, the distribution of requirements and obligations depends on the type of operator, which encompasses the provider, the user, the authorised representative, the importer, and the distributor (European Commission, 2021). Meanwhile, in practice, the lifecycle of an AI system involves different parties with different roles, which tend to vary depending on the organisation and the use case. According to an amalgamation of different business practices related to the various stages of an AI project, we establish a general framework and determine the relevant roles at different project stages. The general framework is then used as the basis for allocating compliance responsibilities, requirements, and obligations of the ‘AI Act’ to the individual roles. The understanding is that a combination of centralised and distributed accountability models is the more feasible model of AI governance.

Maritime awareness using sound, visible and infrared imaging - Gorad, Ajinkya (Aalto University); Särkkä, Simo (Aalto University); Hammarberg, Toni (Finnish Geospatial Institute); Saajasto, Mika (Finnish Geospatial Institute); Ramm-Schmidt, Henrik (Fleetrange) [click for abstract]

Maritime awareness can be enabled using sensors and Artificial Intelligence (AI). Commonly sensors include visible and infrared cameras and microphones. Artificial Intelligence typically can use deep convolutional networks for ship and ice detection using image processing, and spiking neural networks (SNN) for delay-based sound processing. Ship detection is possible using recently developed networks such as YOLOv5 in visible and infrared images, and information can be fused from both imaging methods to provide a more accurate estimate of the position. Ice detection is possible using available semantic segmentation methods, for e.g. Detectron2 network. SNN, which are the third generation of neural networks can use their synaptic delay properties to capture delays from a pair of microphones in an microphone array and can provide direction of the incoming sound. Having awareness about the surroundings can help ships navigate in the seas. We have already submitted an article on Ship bearing estimation using visible and infrared cameras.

Improved Training of Physics-Informed Neural Networks with Model Ensembles - Haitsiukevich, Katsiaryna (Aalto University); Ilin, Alexander (Aalto University) [click for abstract]

Learning the solution of partial differential equations (PDEs) with a neural network is an attractive alternative to traditional solvers due to its elegance, greater flexibility and the ease of incorporating observed data. However, training such physics-informed neural networks (PINNs) is notoriously difficult in practice since PINNs often converge to wrong solutions. In this paper, we address this problem by training an ensemble of PINNs. Our approach is motivated by the observation that individual PINN models find similar solutions in the vicinity of points with targets (e.g., observed data or initial conditions) while their solutions may substantially differ farther away from such points. Therefore, we propose to use the ensemble agreement as the criterion for gradual expansion of the solution interval, that is including new points for computing the loss derived from differential equations. Due to the flexibility of the domain expansion, our algorithm can easily incorporate measurements in arbitrary locations. In contrast to the existing PINN algorithms with time-adaptive strategies, the proposed algorithm does not need a pre-defined schedule of interval expansion and it treats time and space equally. We experimentally show that the proposed algorithm can stabilize PINN training and yield performance competitive to the recent variants of PINNs trained with time adaptation.

Human-centered Chatbots - Hartikainen, Maria, Tampere Universities - Computer Science; Väänänen, Kaisa, Tampere Universities - Computer Science, Thankachan, Biju, Tampere Universities - Computer Science [click for abstract]

Chatbot is a conversational software with a wide range of applications. Adding Artificial Intelligence (AI) to chatbots enhances chatbot’s abilities to offer natural and intuitive interaction. AI can provide more possibilities for the user to communicate with the chatbot in the way that is preferable to the user. In addition, AI chatbots can learn from the use and its behaviour can evolve to better respond to a variety of user needs. Hence, AI improves the interaction between human and chatbot and increases chatbots’ affordances. AI chatbots have sparked a substantial interest due to recent technological advancements. For chatbots’ successful utilisation in different domains, it is essential that they are developed for humans in a way that meets their needs. Human-centred technology design puts humans in the centre of the development. However, there is a little research-based information about what makes chatbots human-centred. We reviewed recent related literature in order to determine what human-centredness means in AI chatbots. Based on this data, our study reveals five themes describing what human-centred AI chatbots are: (i) AI chatbots are here to serve humans, (ii) development decisions are based on real requirements, (iii) explainability, transparency, and technical robustness are the keys to build trust, (iv) ethically aligned design ensure safe and fair use, and (v) AI chatbot design combines user interface design, dialogue design, and bot persona design. Derived from these themes, we present 10 principles describing the requirements of human-centred AI-chatbots. The findings of our study can add to the understanding on the phenomena and can help the companies develop chatbots that are successful in supporting their users accomplish their tasks.

Joint Non-parametric Point Process model for Treatments and Outcomes: Counterfactual Time-series Prediction Under Policy Interventions - Hızlı, Çağlar, Aalto University; John, ST, Aalto University; Juuti, Anne, Helsinki University Hospital and University of Helsinki; Saarinen, Tuure, Helsinki University Hospital and University of Helsinki; Pietiläinen, Kirsi, Helsinki University Hospital and University of Helsinki; Marttinen, Pekka, Aalto University [click for abstract]

Policy makers need to predict the progression of an outcome before adopting a new treatment policy, which defines when and how a sequence of treatments affecting the outcome occurs in continuous time. Commonly, algorithms that predict interventional future outcome trajectories take a fixed sequence of future treatments as input. This either neglects the dependence of future treatments on outcomes preceding them or implicitly assumes the treatment policy is known, and hence excludes scenarios where the policy is unknown or a counterfactual analysis is needed. To handle these limitations, we develop a joint model for treatments and outcomes, which allows for the estimation of treatment policies and effects from sequential treatment--outcome data. It can answer interventional and counterfactual queries about interventions on treatment policies, as we show with real-world data on blood glucose progression and a simulation study building on top of this.

Modelling student ability in language learning - Hue, Jue (University of Helsinki); Katinskaia, Anisia(University of Helsinki); Vu, Anh-Duc (University of Helsinki); Yangarber, Roman (University of Helsinki) [click for abstract]

Automatic assessment of proficiency levels of the learner is an essential part of Intelligent Tutoring Systems. Exhaustive testing across a wide range of skills is undesirable for a number of reasons. Thus, we aim for efficient but accurate adaptive testing. We present the approach we took in the context of language learning. We use learner data, collected under imperfect conditions, to train a model for an adaptive testing procedure based on Item Response Theory (IRT). We present simulations with artificially generated as well as real student answers, to confirm that this approach is accurate and efficient. We introduce randomness into the simulations for a more faithful reflection of real-life situations, and explore various termination criteria.

DPVIm: Differentially Private Variational Inference Improved - Jälkö, Joonas, University of Helsinki; Prediger, Lukas, Aalto University; Honkela, Antti, University of Helsinki; Kaski, Samuel, Aalto University, University of Manchester [click for abstract]

Differentially private (DP) release of multidimensional statistics typically considers an aggregate sensitivity, e.g. the vector norm of a high-dimensional vector. However, different dimensions of that vector might have widely different magnitudes and therefore DP perturbation disproportionately affects the signal across dimensions. We observe this problem in the gradient release of the DP-SGD algorithm when using it for variational inference (VI), where it manifests in poor convergence as well as high variance in outputs for certain variational parameters, and make the following contributions: (i) We mathematically isolate the cause for the difference in magnitude between gradient parts corresponding to different variational parameters. Using this as prior knowledge we establish a link between the gradients of the variational parameters, and propose an efficient while simple fix for the problem to obtain a less noisy gradient estimator which we call aligned gradients. We compare this to alternative approaches for scaling the gradients using analytically derived preconditioning, e.g. natural gradients. (ii) We suggest using iterate averaging over the DP parameter traces recovered during the training, to reduce the DP induced noise in parameter estimates at no additional cost in privacy. Finally, (iii) to accurately capture the additional uncertainty DP introduces to the model parameters, we infer the DP induced noise from the parameter traces and include that in the learned posteriors to make them noise aware. We demonstrate the efficacy of our proposed improvements through various experiments on real data.

Designer-in-the-Loop Layout Autocompletion with Graph Neural Networks - Jiang, Yue (Aalto University); Garg, Vikas (Aalto University); Oulasvirta, Antti (Aalto University) [click for abstract]

In this project, we propose a novel Graph Neural Networks (GNN)-based approach to perform user interface (UI) autocompletion. We introduce a graph-based layout representation embedding both the UI element properties and layout constraints, such as alignment, equal-size, and equal-gap as a graph. Instead of only considering the positions and sizes of the existing UI elements, our approach encodes their properties and semantic content, such as the UI element type, text shown on the UI element, and its appearance. Taking the graph as input, we introduce a novel designer-in-the-loop UI layout autocompletion approach with GNNs to better capture the structure and properties of the UI layout for interactive and iterative design.

Optimization-based approaches can give designers more control over their design, while data-driven learning approaches are better at generating different results and suggestions. Given a partial graphical user interface layout and some UI elements already placed on the canvas, placing the remaining UI elements and determining their sizes suffer from the computational explosion as exploring all the possible combinations is infeasible. Our approach is the first attempt to integrate constraint optimization into a data-driven approach to generate high-quality UI designs while giving designers iterative control over their designs. It also explores the design space by encoding semantic content and Inferring the designer's task.

Feature engineering -based machine learning models for operational state recognition of rotating machines - Junttila, Jukka (VTT Technical Research Centre of Finland); Lämsä, Ville (VTT Technical Research Centre of Finland);Espinosa Leal, Leonardo (Department of Business Management and Analytics, Arcada University of Applied Sciences); Sillanpää, Anssi (Wärtsilä Finland Oy) [click for abstract]

Considering rotating machines, and especially the in-service phase of their lifecycle, valuable information about the current state of the machine should be produced for the owner and operator. The information should be accurate, up-to-date, and available anywhere. These requirements promote the need for efficient data transfer, data acquisition and especially data processing methods at the source of information. Thus, our aim was to provide a data-based model for operational state recognition and detection of abnormal operation of a gas engine generating set (genset) in near real-time. First, computationally light features that are sensitive to the changes in the operational state were identified and extracted from measured mechanical vibration data. Thereafter, two different types of machine learning models based on the extracted features for the state recognition were built. The first, a classification model, can identify the current power output level very accurately. The second, a novelty detection model, can detect abnormal operation, in fault situations, at a specific power output level. The accuracy and computational efficiency of models build using various types of classification and novelty detection algorithms were compared. A fast and accurate two-step state recognition model can be built by combining the classification and novelty detection models. However, there is typically a significant disparity between the amount of data measured during normal and abnormal operation. Sufficient measured data to build a model for detecting and recognizing abnormal operation is rarely available. The lack of abnormal operation data can be compensated, e.g., by creating more data through simulations. We studied the effect of using varying input data on the simulated output, i.e., simulated mechanical vibration responses. A comparative validation showed that the simulated responses resemble the measured ones statistically but revealed significant absolute differences.

Comparison of New Curriculum Criteria for End-to-End Automatic Speech Recognition - Karakasidis, Georgios (Aalto University); Grósz, Tamás (Aalto University); Kurimo, Mikkko (Aalto University) [click for abstract]

Organized and structured learning has the ability to enable faster and better learning. For example, when humans learn to speak, they first learn how to utter basic phones and then slowly move towards more complex structures such as words and sentences. Motivated by this observation, researchers started to adapt this approach for training Artificial Intelligence models. Since the main concept, the gradual increase in difficulty, resembles the notion of the curriculum in education, the methodology became known as Curriculum Learning (CL). In this work, we apply CL to train Automatic Speech Recognition systems, specifically focusing on the so-called end-to-end models. These models contain a single, large-scale neural network to perform the task, in contrast to the traditional way of having several specialized components focusing on different subtasks (e.g. acoustic and language modeling). We hypothesize that end-to-end models can achieve better performances if they are provided with an organized training set consisting of examples that exhibit an increasing level of difficulty. To impose structure on the training set and to define the notion of an easy example, we explored multiple solutions that either use external/static scoring methods or incorporate feedback from the model itself. In addition, we examined the effect of pacing functions that control the pace by which the data are presented to the network. Our proposed curriculum learning strategies were tested on the task of speech recognition on two data sets, one containing spontaneous Finnish speech where volunteers were asked to develop a topic, and one containing planned English speech. Empirical results showed that a good curriculum strategy can yield both performance improvements and speed-up convergence. With a limited number of training steps, our best strategy managed to achieve a 5.6% and 3.4% decrease in terms of test set word error rate, for the Finnish and English data set, respectively.

Robust PPG Peak Detection Using Dilated Convolutional Neural Networks - Kazemi, Kianoosh (University of Turku); Laitala, Juho (University of Turku); Azimi, Iman (University of Turku, University of Irvine); Liljeberg, Pasi (University of Turku); Rahmani, Amir M. (University of Irvine) [click for abstract]

Accurate peak determination from noise-corrupted photoplethysmogram (PPG) signal is the basis for further analysis of physiological quantities such as heart rate. Conventional methods are designed for noise-free PPG signals and are insufficient for PPG signals with low signal-to-noise ratio (SNR). This paper focuses on enhancing PPG noise-resiliency and proposes a robust peak detection algorithm for PPG signals distorted due to noise and motion artifact. Our algorithm is based on convolutional neural networks (CNNs) with dilated convolutions. We train and evaluate the proposed method using a dataset collected via smartwatches under free-living conditions in a home-based health monitoring application. A data generator is also developed to produce noisy PPG data used for model training and evaluation. The method performance is compared against other state-of-the-art methods and is tested with SNRs ranging from 0 to 45 dB. Our method outperforms the existing adaptive threshold, transform-based, and machine-learning methods. The proposed method shows overall precision, recall, and F1-score of 82%, 80%, and 81% in all the SNR ranges. In contrast, the best results obtained by the existing methods are 78%, 80%, and 79%. The proposed method proves to be accurate for detecting PPG peaks even in the presence of noise.

Uncertainty Estimation with Calibrated Confidence Scores - Kivimäki, Juhani (University of Helsinki) [click for abstract]

Modern machine learning models can achieve impressive results in a wide range of different tasks but are usually poor in expressing reliably how confident they are with their predictions. In an industrial setting, the end goal is usually not a prediction of a model, but a decision based on that prediction. It is often not sufficient to generate high-accuracy predictions on average. One also needs to estimate uncertainty and risks involved when making the decisions. Thus, having reliable and calibrated uncertainty estimates is highly useful for any model used in automated decision making.

In this paper we present a case study, where we propose a novel method to improve the uncertainty estimates of an in-production machine learning model operating in an industrial setting with real-life data. This model is used by a Finnish company Basware to extract information from invoices in the form of machine-readable PDFs. The solution we propose is shown to produce calibrated confidence estimates, which outperform legacy estimates on several relevant metrics, increasing coverage of automated invoices from 65.6% to 73.2% with no increase in error rate.

Finnish / Swedish Handwritten and Typed Text Recognition using Fine-tuned Transformer Model - Koistinen, Mika (Silo AI); Parida, Shantipriya (Silo AI); Ehsani, Razieh (Silo AI); Granroth-Wilding, Mark (Silo AI); Varjokallio, Matti (Silo AI); [click for abstract]

Text extraction from historical scanned document images is a challenging task for non-English languages due to a lack of resource availability. In many instances, the available Optical Character Recognition (OCR) models fail to detect Finnish and Swedish handwritten/typed text. Other possible reasons for weak quality are different typesetting, layout, and scanned document quality. This paper presents the work that has been done at Silo AI for handwritten and typed text recognition by fine-tuning existing transformer models. We experimented with handwritten IAM, IMGUR5K and NAF datasets and historical Fraktur typed text NLF dataset. We evaluated the results based on automatic evaluations. We also did a comparison study with the state-of-the-art models on the given datasets. We obtain comparable results with much less GPU computation.

Autonomy and Intelligence in the Computing Continuum: Challenges, Enablers, and Future Directions for Orchestration - Kokkonen, Henna (University of Oulu, Finland); Lovén, Lauri (University of Oulu, Finland); Naser Hossein Motlagh (University of Helsinki, Finland); Juha Partala (University of Oulu, Finland); Alfonso González-Gil (Zylk.net, Spain); Ester Sola (Zylk.net, Spain); Iñigo Angulo (Zylk.net, Spain); Madhusanka Liyanage (University College Dublin, Ireland); Teemu Leppänen (University of Oulu, Finland); Tri Nguyen (University of Oulu, Finland); Panos Kostakos (University of Oulu, Finland); Mehdi Bennis (University of Oulu, Finland); Sasu Tarkoma (University of Helsinki, Finland); Schahram Dustdar (TU Wien, Austria); Susanna Pirttikangas (University of Oulu, Finland); Jukka Riekki (University of Oulu, Finland) [click for abstract]

Future AI applications require performance, reliability, and privacy that the existing, cloud-dependant system architectures cannot provide. In this article, we study orchestration in the device-edge-cloud continuum, and focus on AI for edge, that is, the AI methods used in resource orchestration. We claim that to support the constantly growing requirements of intelligent applications in the device-edge-cloud computing continuum, resource orchestration needs to embrace edge AI and emphasize local autonomy and intelligence. To justify the claim, we provide a general definition for continuum orchestration, and look at how current and emerging orchestration paradigms are suitable for the computing continuum. We describe certain major emerging research themes that may affect future orchestration, and provide an early vision of an orchestration paradigm that embraces those research themes. Finally, we survey current key edge AI methods and look at how they may contribute into fulfilling the vision of future continuum orchestration.

Predictive model for the reconstruction of concentration chemical depth profiles obtained by XPS method - Konstantin Tumashevich (University of Oulu, Center for Atmospheric Research); Matthew Ozone (University of Oulu, Center for Atmospheric Research); Nønne L. Prisle (University of Oulu, Center for Atmospheric Research) [click for abstract]

Climate models have proven to be extremely important tools for understanding and simulating climate change. One of the core issues in climate prediction is the large uncertainties due to the incomplete knowledge about cloud formation which directly depends on aerosol particle properties and their small-scale interactions leading to formation of droplets. To acquire the data of sufficient quality a powerful surface and chemical sensitive diagnostic is needed, such as synchrotron light-based X-ray photoelectron spectroscopy (XPS). However, high experimental cost and limited accessibility due to the uniqueness of each beamline station limit the amount of findings which could be gained. We developed a predicting computational framework built on Bayesian inversion, which is intended for the estimation of the concentration profiles of chemical species probed with X-rays. The model simulates the signal generated by the photoelectron flux emitted by X-ray beam/sample interaction and collected at the certain angle by the kinetic energy analyser. The algorithm is capable to reconstruct the concentration depth profile below the air/droplet interface using basic experimental conditions and several experimental data points (raw spectra). The model is expected to extend the scope of data obtained by contributing to effective experiment design and analysis.

Hierarchical Imitation Learning with Vector Quantized Models - Kujanpää, Kalle (Aalto University); Pajarinen, Joni (Aalto University); Ilin, Alexander (Aalto University) [click for abstract]

The ability to plan actions on multiple abstraction levels is an important skill of intelligent agents which allows them to solve complex tasks. However, learning the models required for both low and high-level planning from demonstrations has been extremely challenging without extra supervision, especially with high-dimensional inputs. In this work, we train a hierarchical agent using expert trajectories that contain low-level actions and no supervision on high-level commands. We propose an algorithm that identifies likely subgoals in the existing trajectories by formulating it as a reinforcement learning problem with high rewards associated with the high predictability of the low-level actions given the current state and the subgoal. Then, we build a vector-quantized generative model for the identified subgoals and perform high-level planning with MCTS using the subgoal generative model for node expansion. We demonstrate the success of our algorithm on a difficult high-dimensional traveling salesman problem, where it significantly outperforms strong baselines. We show that the student agent can even improve on the teacher whose trajectories were used for training the agent.

AI Marketplace: Semantic Interoperability across AI Ecosystem - Kumar, Abhishek; Mohsseni, Vahid; Pirttikangas, Susanna; Tarkoma, Sasu [click for abstract]

Artificial intelligence shows promise for solving many practical societal problems in areas such as healthcare and transportation. In principle, the AI marketplace can achieve the democratization of AI models by facilitating the buying and selling of AI models among different actors, such as AI developers and AI customers. However, despite the benefits, the current mechanisms for AI model diffusion such as Github code repositories, academic project webpages, and commercial AI marketplaces do not support the true democratization of AI models due to several limitations. Different developers use different frameworks (TensorFlow, PyTorch, Caffe2), and different languages (Python, Java, C/C++), and target different environments (powerful Linux server, smartphone, minimalist IoT device) depending on the intended usage. Hence, a lack of interoperability standards among AIOps/MLOps tools is a major hindrance to the vision for AI democratization. In this work, we shed light on addressing interoperability in the AI development and production environment. We focus on semantic interoperability in three-layer: data, devices and development tools which are crucial elements for deployable AI.

Sparse Variational Bayesian Monte Carlo - Li Chengkun, University of Helsinki; Clarté Grégoire, University of Helsinki; Acerbi Luigi, University of Helsinki [click for abstract]

We introduce Sparse Variational Bayesian Monte Carlo (\SVBMC), a method for fast "post-process" Bayesian inference for models with \emph{black-box} and potentially noisy likelihoods. \SVBMC reuses all existing target density evaluations -- for example, from previous optimizations or partial Markov Chain Monte Carlo runs -- to build a sparse Gaussian process (GP) surrogate model of the log posterior density. Uncertain regions of the surrogate are then refined via active learning as needed. Our work builds on the Variational Bayesian Monte Carlo (\VBMC) framework for sample-efficient inference, with several novel contributions. First, we make \VBMC scalable to a large number of pre-existing evaluations via \emph{sparse} GP regression, deriving novel Bayesian quadrature formulae and acquisition functions for active learning with sparse GPs. Second, we introduce \emph{noise shaping}, a general technique to induce the sparse GP approximation to focus on high posterior density regions. Third, we prove theoretical results in support of the \SVBMC refinement procedure. We validate our method on a variety of challenging synthetic scenarios and real-world applications. We find that \SVBMC consistently builds good posterior approximations by post-processing of existing model evaluations from different sources, often requiring only a small number of additional density evaluations.

Multi-Modal Lidar Dataset for Benchmarking General-Purpose Localization and Mapping Algorithms - Li, Qingqing (University of Turku); Yu, Xianjia (University of Turku); Peña Queralta, Jorge (University of Turku); Westerlund, Tomi (University of Turku); [click for abstract]

Lidar technology has evolved significantly over the last decade, with higher resolution, better accuracy, and lower cost devices available today. In addition, new scanning modalities and novel sensor technologies have emerged in recent years. Public datasets have enabled benchmarking of algorithms and have set standards for the cutting edge technology. However, existing datasets are not representative of the technological landscape, with only a reduced number of lidars available. This inherently limits the development and comparison of general-purpose algorithms in the evolving landscape. This paper presents a novel multi-modal lidar dataset with sensors showcasing different scanning modalities (spinning and solid-state), sensing technologies, and lidar cameras. The focus of the dataset is on low-drift odometry, with ground truth data available in both indoors and outdoors environment with sub-millimeter accuracy from a motion capture (MOCAP) system. For comparison in longer distances, we also include data recorded in larger spaces indoors and outdoors. The dataset contains point cloud data from spinning lidars and solid-state lidars. Also, it provides range images from high resolution spinning lidars, RGB and depth images from a lidar camera, and inertial data from built-in IMUs. This is, to the best of our knowledge, the lidar dataset with the most variety of sensors and environments where ground truth data is available. This dataset can be widely used in multiple research areas, such as 3D LiDAR simultaneous localization and mapping (SLAM), performance comparison between multi-modal lidars, appearance recognition and loop closure detection. The datasets are available at: https://github.com/TIERS/tiers-lidars-dataset

From Misbehaviour and Fault Tolerance in ML system towards dependable and self-improving MLOps - Luo, Yumo; Raatikainen, Mikko; Myllyaho, Lalli; Nurminen, Jukka K. [click for abstract]

Different kinds of misbehaviour may occur in ML systems, including unexpected input-output pairs, poor quality of incoming data, and decay of ML models [1]. While an ML system can have built-in fault tolerance, we also need to take a look beyond the system itself to its development and deployment practices. Towards this end, MLOps -- as a derivative of widely applied DevOps practice for software systems -- has recently emerged to cover practices and tools for ML-based systems that technically enable iterative software engineering. In this work, we present an architecture and realization of a self-improving pipeline that reduces the problems of model decay. The pipeline allows exploring the autonomous monitoring, retraining, validation, and deployment of a new model to replace the decayed one. We have used open source tools to build the pipeline relying on a Kubernetes-based MLOps architecture that supports 1) model monitoring using Prometheus and Evidently; 2) automatic retraining when model performance degrades in Kubeflow pipeline based on monitoring results; and 3) automated A/B testing in production that iteratively assigns load between two competing models until the winning model has been found using Iter8 tool.

Robustness of Sketched Linear Classifiers to Adversarial Attacks - Mahadevan, Ananth (University of Helsinki); Merchant, Arpit (University of Helsinki); Wang, Yanhao (East China Normal University); Mathioudakis, Michael (University of Helsinki) [click for abstract]

Linear classifiers are well-known to be vulnerable to adversarial attacks: they may predict incorrect labels for input data that are adversarially modified with small perturbations. However, this phenomenon has not been properly understood in the context of sketch-based linear classifiers, typically used in memory-constrained paradigms, which rely on random projections of the features for model compression. In this paper, we propose novel Fast-Gradient-Sign Method (FGSM) attacks for sketched classifiers in full, partial, and black-box information settings with regards to their internal parameters. We perform extensive experiments on the MNIST dataset to characterize their robustness as a function of perturbation budget. Our results suggest that, in the full-information setting, these classifiers are less accurate on unaltered input than their uncompressed counterparts but just as susceptible to adversarial attacks. But in more realistic partial and black-box information settings, sketching improves robustness while having lower memory footprint.

Certifiable Unlearning Pipelines for Logistic Regression: An Experimental Study - Mahadevan,Ananth (University of Helsinki); Mathioudakis, Michael (University of Helsinki) [click for abstract]

Machine unlearning is the task of updating machine learning (ML) models after a subset of the training data they were trained on is deleted. Methods for the task are desired to combine effectiveness and efficiency (i.e., they should effectively “unlearn” deleted data, but in a way that does not require excessive computational effort (e.g., a full retraining) for a small amount of deletions). Such a combination is typically achieved by tolerating some amount of approximation in the unlearning. In addition, laws and regulations in the spirit of “the right to be forgotten” have given rise to requirements for certifiability (i.e., the ability to demonstrate that the deleted data has indeed been unlearned by the ML model). In this paper, we present an experimental study of the three state-of-the-art approximate unlearning methods for logistic regression and demonstrate the trade-offs between efficiency, effectiveness and certifiability offered by each method. In implementing this study, we extend some of the existing works and describe a common unlearning pipeline to compare and evaluate the unlearning methods on six real-world datasets and a variety of settings. We provide insights into the effect of the quantity and distribution of the deleted data on ML models and the performance of each unlearning method in different settings. We also propose a practical online strategy to determine when the accumulated error from approximate unlearning is large enough to warrant a full retraining of the ML model.

Evaluation of the roles of intelligent technologies in shared activity spaces of neighborhood communities - Makkonen, Jouko (Tampere University); Rubio Hernández, Rosana (Tampere University); Latikka, Rita (Tampere University); Väänänen, Kaisa (Tampere University) [click for abstract]

The use of shared spaces in urban neighborhoods can advance sustainability and residents’ sense of community. Intelligent technologies may take different roles in supporting activities and social interaction in shared spaces. We present a mixed-method evaluation study of technology roles for space sharing in the context of planned residential area of Hiedanranta in Tampere, Finland. A survey (N=85) and three focus groups (N=13) were conducted to evaluate the suitability of these pre-defined roles for advancing the use of shared activity spaces in residential areas from the user experience standpoint, using the Hiedanranta area as the study context. Four previously defined conceptual roles were addressed: community sheriff, matchmaker, facilitator, and tutor. The findings contribute to the research of sustainable technologies for supporting shared spaces and activities in urban residents’ local communities. Overall, the roles were considered positive, however several participants experienced the sheriff inefficient and the matchmaker as intrusive. The facilitator was considered potential for pragmatic support and the tutor as an add-on to the other roles. From resident perspective, the technology roles for space sharing have potential to provide pragmatic and social benefits for neighborhood communities. Intelligent technologies with these roles have potential to lower social threshold of using activity spaces, however, participants were concerned of their privacy and intrusiveness of technology. The implementation requires careful human-centered design to avoid the pitfalls – especially related to privacy – presented in this study. There are several use cases and purposes for intelligent technologies with the pre-defined roles for space sharing, however, their feasibility and viability require deeper evaluation.

CROWDSOURCING STRONG LABELS FOR SOUND EVENT DETECTION - Martín-Morató, Irene (Tampere University); Harju, Manu (Tampere University); Mesaros, Annamaria (Tampere University) [click for abstract]

Strong labels are a necessity for evaluation of sound event detection methods, but often scarcely available due to the high resources required by the annotation task. We present a method for estimating strong labels using crowdsourced weak labels, through a process that divides the annotation task into simple unit tasks. Based on estimations of annotators' competence, aggregation and processing of the weak labels results in a set of objective strong labels. The experiment uses synthetic audio in order to verify the quality of the resulting annotations through comparison with ground truth. The proposed method produces labels with high precision, though not all event instances are recalled. Detection metrics comparing the produced annotations with the ground truth show 80% F-score in 1s segments, and up to 89.5% intersection-based F1-score calculated according to the polyphonic sound detection score metrics.

An automated feature selection and classification pipeline to improve explainability of clinical prediction models - Moreno-Sanchez, Pedro A. (Faculty of Medicine and Health Technology -Tampere University) [click for abstract]

Artificial Intelligence is becoming recently a promising tool to achieve the deployment of personalized medicine in clinical practice. However, healthcare professionals are demanding clinical prediction models with better interpretability of the results in order to achieve an actual adoption and use of these solutions. The eXplainable Artificial Intelligence tackle this issue by offering feature relevance explanations of the model, among other techniques, where the selection of the important features and elimination of the redundant are cornerstones. This work presents a data management pipeline that allows automating the selection of those relevant features as well as the classifier technique that provides the best performance in terms of classification. The pipeline developed, named SCI-XAI (feature Selection and Classification for Improving eXplainable Artificial Intelligence) has been evaluated with 6 clinical datasets in a cross-validation approach as well as in a test set with unseen data. Next, an explainability evaluation has been carried out of the best models obtained by applying the SCI-XAI pipeline. Results obtained show that SCI-XAI achieves the best classification performance by applying different feature selection techniques depending on the variable type of the feature which reduces significantly the features processed by the model. Thus, feature reduction allows increasing the explainability of the models.

SAM: Self-augmentation mechanism for COVID-19 detection using chest X-ray images - Muhammad, Usman (University of Oulu) [click for abstract]

COVID-19 is a rapidly spreading viral disease and has affected over 100 countries worldwide. The numbers of casualties and cases of infection have escalated particularly in countries with weakened healthcare systems. Recently, reverse transcription-polymerase chain reaction (RT-PCR) is the test of choice for diagnosing COVID-19. However, current evidence suggests that COVID-19 infected patients are mostly stimulated from a lung infection after coming in contact with this virus. Therefore, chest X-ray (i.e., radiography) and chest CT can be a surrogate in some countries where PCR is not readily available. This has forced the scientific community to detect COVID-19 infection from X-ray images and recently proposed machine learning methods offer great promise for fast and accurate detection. Deep learning with convolutional neural networks (CNNs) has been successfully applied to radiological imaging for improving the accuracy of diagnosis. However, the performance remains limited due to the lack of representative X-ray images available in public benchmark datasets. To alleviate this issue, we propose a self-augmentation mechanism for data augmentation in the feature space rather than in the data space using reconstruction independent component analysis (RICA). Specifically, a unified architecture is proposed which contains a deep convolutional neural network (CNN), a feature augmentation mechanism, and a bidirectional LSTM (BiLSTM). The CNN provides the high-level features extracted at the pooling layer where the augmentation mechanism chooses the most relevant features and generates low-dimensional augmented features. Finally, BiLSTM is used to classify the processed sequential information. We conducted experiments on three publicly available databases to show that the proposed approach achieves the state-of-the-art results with accuracy of 97%, 84% and 98%.

Augmenting the Student-Teacher Feature Pyramid Matching Method for Better Unsupervised Anomaly Localization - Mylläri, Juha (University of Helsinki) [click for abstract]

Anomaly detection in images is the machine learning task of classifying input images as normal or anomalous. Anomaly localization is the related task of segmenting input images into normal and anomalous regions. The output of an anomaly localization model is a 2D array, called an anomaly map, of pixel-level anomaly scores. For example, an anomaly localization model trained on images of industrial products should output high anomaly scores in image regions corresponding to visible defects in a product. In unsupervised anomaly localization the model is trained solely on normal data, i.e. without labelled training observations containing anomalies. This is often necessary as anomalous observations may be hard to obtain in sufficient quantities and labelling them is time-consuming and costly. Student-teacher feature pyramid matching (STFPM) is a recent and powerful method for unsupervised anomaly detection and localization that uses two convolutional neural networks (CNNs), called the teacher and the student, of identical architecture. The teacher is pre-trained on a large image dataset and frozen. The student is then trained on normal (i.e. non-anomalous) data to mimic the activations of the teacher in a designated set of layers. In inference, discrepancies in the activations of the two networks indicate the possible presence and location of anomalies. We propose a method, called discrepancy scaling, of augmenting STFPM to produce better segmentations. Our method uses pre-calculated statistics that contain information about the model’s behaviour on normal data to scale activation discrepancies based on how atypical they are. Our method significantly improves the performance of an STFPM model with the standard ResNet-18 backbone. Even larger gains relative to the base STFPM method are seen when a much smaller CNN, MobileNetV2 is used; in fact, with discrepancy scaling, the performance of the latter model comes reasonably close to the performance of the former.

Silo NLP’s Participation at WAT2022 - Parida, Shantipriya (Silo AI, Finland); Panda, Subhadarshi (CUNY, USA); Grönroos, Stig-Arne (Silo AI, Finland); Granroth-Wilding, Mark (Silo AI, Finland); Koistinen, Mika (Silo AI, Finland) [click for abstract]

This paper provides the system description of Silo NLP's submission to the Workshop on Asian Translation (WAT2022). We have participated in the Indic Multimodal tasks (English->Hindi, English->Malayalam, and English->Bengali Multimodal Translation). For text-only translation, we trained Transformers from scratch and fine-tuned mBART-50 models. For multimodal translation, we used the same mBART architecture and extracted object tags from the images to use as visual features concatenated with the text sequence.

Our submission tops many tasks including English->Hindi multimodal translation (evaluation test), English->Malayalam text-only and multimodal translation (evaluation test), English->Bengali multimodal translation (challenge test), and English->Bengali text-only translation (evaluation test).

Noise-Aware Statistical Inference with Differentially Private Synthetic Data - Räisä, Ossi (University of Helsinki); Jälkö, Joonas (Aalto University); Kaski, Samuel (Aalto University, University of Manchester); Honkela, Antti (University of Helsinki) [click for abstract]

While generation of synthetic data under differential privacy (DP) has received a lot of attention in the data privacy community, analysis of synthetic data has received much less. Existing work has shown that simply analysing DP synthetic data as if it were real does not produce valid inferences of population-level quantities. For example, confidence intervals become too narrow, which we demonstrate with a simple experiment. We tackle this problem by combining synthetic data analysis techniques from the field of multiple imputation (MI), and synthetic data generation using noise-aware (NA) Bayesian modeling into a pipeline NA+MI that allows computing accurate uncertainty estimates for population-level quantities from DP synthetic data. To implement NA+MI for discrete data generation from marginal queries, we develop a novel noise-aware synthetic data generation algorithm NAPSU-MQ using the principle of maximum entropy. Our experiments demonstrate that the pipeline is able to produce accurate confidence intervals from DP synthetic data. The intervals become wider with tighter privacy to accurately capture the additional uncertainty stemming from DP noise.

Efficient Computation of Answer Sets via SAT Modulo Acyclicity and Vertex Elimination - Rankooh, Masood Feyzbakhsh (Tampere University); Janhunen, Tomi (Tampere University) [click for abstract]

Answer set programming (ASP) is a declarative programming paradigm where the solutions of a search problem are captured by the answer sets of a logic program describing its solutions. Besides native algorithms implemented as answer-set solvers, the computation of answer sets can be realized (i) by translating the logic program into propositional logic or its extensions and (ii) by finding satisfying assignments with appropriate solvers. In this work, we recall the graph-based extension of propositional logic, viz. SAT modulo graphs, and the case of acyclicity constraint which keeps a digraph associated with each truth assignment acyclic. This particular extension lends itself very well for answer set computation, e.g., using extended SAT solvers, such as GraphSAT, as back-end solvers. The goal of this work, however, is to translate away the acyclicity extension altogether using a vertex elimination technique, giving rise to a translation from ASP into propositional clauses only. We use non-tight benchmarks and a state-of-the-art SAT solver, Kissat, to illustrate that performance obtained in this way can be competitive against GraphSAT and native ASP solvers such as Clasp and Wasp.

Low Resource Comparison of Attention-based and Hybrid ASR Exploiting wav2vec 2.0 - Rouhe, Aku (Aalto University); Virkkunen, Anja (Aalto University); Leinonen, Juho (Aalto University); Kurimo, Mikko (Aalto University) [click for abstract]

Low resource speech recognition can potentially benefit a lot from exploiting a pretrained model such as wav2vec 2.0. These pretrained models have learned useful representations in an unsupervised or self-supervised task, often leveraging a very large corpus of untranscribed speech. The pretrained models can then be used in various ways. In this work we compare two approaches which exploit wav2vec 2.0: an attention-based end-to-end model (AED), where the wav2vec 2.0 model is used in the model encoder, and a hybrid hidden Markov model (HMM/DNN) speech recognition system, where the wav2vec 2.0 model is used in the acoustic model. These approaches are compared in a very difficult Northern Sámi task, as well as an easier, simulated low resource task in Finnish. We find that the wav2vec 2.0 AED models can learn a working attention mechanism, but are still outperformed by wav2vec 2.0 HMM/DNN systems. Our best wav2vec 2.0 HMM/DNN recipe on 20 hours is competitive with an HMM/DNN system trained on 1600 hours.

Alone with you and you and you…. Conceptualisations of privacy in AI systems - Rousi, Rebekah; School of Marketing & Communication, University of Vaasa [click for abstract]

We will never be alone again. Our relationships with our devices are becoming less likely to be one-on-one affairs. Our phones are connected. Our watches are connected. Even our intimate devices are connected (if we want them to be). Interestingly, while we are becoming ever more engrossed in the idea of autonomy and/or autonomous technology substituting the roles of other humans, these devices are becoming ever more reliant on their input and connectivity with other (human) individuals. One of the key topics in the ethics of emerging systems, especially artificial intelligence (AI) ethics, is that of privacy. When machine learning (ML) requires and collects massive amounts of data (including personal) to continually adapt to changing situations, burgeoning questions arise regarding what types of data are collected, where they are stored, who has access, how they are being used, and what the potential risks are for such data to exist and be shared. Technological singularity, or the eventuality that technology will overrule human dominance in years to come, may be one concern that people face in respect to privacy. At this stage of socio-technological developments it is fair to say that primary concern emerges regarding other human beings behind self-learning systems. This paper takes a theoretical and retrospective empirical look at the types of ways people have conceptualized privacy in relation to AI systems. In this review, careful attention is placed on notions of personal space as well as the role of other humans, their relations and proximity to the AI systems. A symbolic interactionist approach is applied to understanding the potential social-emotional weighting between the conceptualisers (study participants, users etc.) as selves, and humans behind the systems – particular others (who and how these are portrayed) versus generalized (insignificant or apathetic) others, fulfilling a function in the network without invested interest in the collected data.

Historical Discourse Detection with HPC - Ryan, Yann (University of Helsinki); Tiihonen, Iiro (University of Helsinki), Wang, Ruilin (University of Helsinki); Pivovarova, Lidia (University of Helsinki); Mahadevan, Ananth (University of Helsinki); Hrín, Adam (University of Helsinki); Zhang, Jinbin (Aalto University); Rastas, Iiro (University of Turku); Nuutinen, Emil (University of Turku); Mathioudakis, Michael (University of Helsinki); Babbar, Rohit (Aalto University); Mäkelä, Eetu (University of Helsinki); Tolonen, Mikko (University of Helsinki); Ginter, Filip (University of Turku) [click for abstract]

This proposal is based on publications by the Academy of Finland EuroHPC project on computational history that started in 2022. Use of AI and HPC in history is scarce due to the complexities in the data. The aim of the consortium is to use HPC to detect discourses from books, pamphlets and newspapers, and study the interconnections and evolution of the detected patterns. The project has the potential not only to uncover novel historical insights, but also to work toward a new multidisciplinary paradigm of historical research.

The two first publications by the consortium are modular. First paper describes a BERT model trained on the ECCO dataset. It poses unique modelling challenges due to the presence of Optical Character Recognition (OCR) artefacts. We establish the performance of the BERT model on a publication year prediction task against linear baseline models and human judgement, finding the BERT model to be superior to both. We also explore how language change over time affects the model by analyzing the features the model uses for publication year predictions as given by the Integrated Gradients model explanation method.

The second publication uses the BERT embeddings of the first paper to make novel advancements in the detection of sequential large-scale genres for eighteenth-century documents and tests also a Perceiver model. Readily available genre information is often sporadic, but the opportunities to use it can open a whole new window to the development of public discourse. With better structured data, we will be able to study the systematization of particular genres in a new manner which is crucial for historical discourse detection.

Ongoing work towards a third publication uses instances of text reuse to explore how ideas evolved over time. The next stage of the project will build on the above pieces of work to enhance the study of books as material objects and the use of latest developments in machine learning and NLP.

Monitoring and Cordoning Wildfires with an Autonomous Swarm of Unmanned Aerial Vehicles - Saffre, Fabrice (VTT); Hildmann, Hanno (TNO); Karvonen, Hannu (VTT); Lind, Timo (VTT) [click for abstract]

Unmanned Aerial Vehicles (UAVs) or “drones” are already an integral part of the equipment used by firefighters to monitor wildfires. They are, however, still typically used only as remotely operated, mobile sensing platforms under direct real-time control of a human pilot. Meanwhile, a substantial body of literature exists that emphasises the potential of autonomous drone swarms in various situational awareness missions, including in the context of environmental protection. In this paper, we present the results of a systematic investigation, by means of numerical methods (Monte Carlo simulation), of the influence of certain key parameters (fire propagation dynamics, surface area under surveillance and swarm size) over the performance of an autonomous drone force operating without human supervision. The objective is defined as being able to detect, then establish a continuous perimeter (cordon) around a simulated fire event. Special emphasis was put on using exclusively simple, robust and realistically implementable distributed decision functions capable of supporting the self-organisation of the swarm in the pursuit of the collective goal. Our results confirm the presence of strong nonlinear effects in the interaction between the aforementioned parameters, which can be closely approximated using an empirical law. These findings could inform the mobilisation of adequate resources on a case-by-case basis, depending on known mission characteristics and acceptable odds (chances of success).

Hyperparameter Optimization of Quantum Annealing Boosted Restricted Boltzmann Machines - Salmenperä, Ilmo; Nurminen, Jukka K. (University of Helsinki) [click for abstract]

A novel variation of quantum computing called quantum annealing has shown promise in tasks, which require sampling from probability distribution called the Boltzmann distribution. This probability distribution is featured quite often in various machine learning tasks, of which the training process of Restricted Boltzmann Machines is probably the most featured occurance. This machine learning model is quite common in the pretraining process of some deep learning models like in deep belief networks. Quantum annealing has been shown to be an effective tool for estimating this probability distribution, sometimes even outperforming classical approaches. As the amount of qubits in these devices increase over time, the applicability of these techniques approach quickly the realm of practicality.

One key issue around using this technique in practice is that applying quantum annealing introduces multiple new hyperparameters for the sampling tasks, making the process of choosing parameters quite convoluted. Even worse, the most optimal parameters seem to change depending on the system that is being estimated, meaning that the best initial choice of parameters might not be suitable for the final steps of the training algorithm.

Our contributions for this challenge is the development and testing of various hyperparameter optimization algorithms, which aim to find some sense to what sort of parameter optimization methods are the most suitable for using quantum annealing in the process of training Restricted Boltzmann Machines. We use common metaheuristics like Grid Search and Gaussian Processes to seek out these parameters, while also testing more advanced methods for determining whether more granulated choice of the device parameters is a good idea during training. Lastly we evaluate the performance of these methods in two dimensions: model performance and how much computational time these algorithms require from the quantum annealing device.

Reflections on the human role in AI policy formulations: how do national AI strategies view people? - Salo-Pöntinen, Henrikki (Cognitive Science, Faculty of Information Technology, University of Jyväskylä); Saariluoma, Pertti (Cognitive Science, Faculty of Information Technology, University of Jyväskylä) [click for abstract]

Purpose: There is no artificial intelligence (AI) without people. People design and develop AI; they modify and use it and they have to reorganize the ways they have carried out tasks in their work and everyday life. National strategies are documents made to describe how different nations foster AI and as human dimensions are such an important aspect of AI, this study sought to investigate major national strategy documents to determine how they view the human role in emerging AI societies.

Approach: Our method for analyzing the strategies was conceptual analysis since the development of technology is embedded with conceptual ideas of humanity, explicit or implicit, and in addition to deepening analysis of explicit argumentation the method enables the deconstruction and reconstruction of meanings and conceptual relations within the strategies, exposing presumptions and tacit commitments of the writers.

Findings: The analysis of the documents illustrates that the general tendency in national strategies is globally dominantly technology-driven as the state of affairs appears to be creating new technologies. However, various human research points such as usability, user experience, sociotechnical and life-based themes are less well represented. Because national strategies are used to develop innovation processes, we argue that future development of national strategies could be improved by taking human research issues more energetically in the agenda.

Originality: Our study elaborates the current trends in AI-policy discourses and discusses reasons and possibilities for more holistic policymaking, making it a valuable resource for policymakers, researchers, and the larger public.

Language of Algorithms: Agency, Metaphors and Deliberations in AI Discourses - Sawhney, Nitin (Aalto University); Kajava, Kaisla (Aalto University) [click for abstract]

Algorithmic technologies, concepts, and practices as socio-technical constructs emerge and proliferate through language in society. Discourses around Artificial Intelligence (AI) shape our collective imagination, affect technological development, and influence policymaking. What can we learn from critically examining wide-ranging discourses around AI, using a mix of qualitative methods and Natural Language Processing (NLP), both among actors who influence its development and the publics who are affected by it? In this chapter, we examine the language of algorithms to make sense of policy documents and stakeholder responses to AI regulation in the European Union. Linguistic devices such as metaphors, metonymy, and personification reveal how we conceptualize, narrate, contest, or attribute agency to AI systems. Saying that AI is “trustworthy”, “biased”, or “transforming society” are discursive acts that implicitly attribute a sense of agency to technology rather than the human actors involved in its creation. Critically examining such AI discourses reveals how language affects attitudes, influences practices and policies, and shapes future imaginaries around AI.

Investigating Bayesian Neural Network Dynamics Models for Model-Based Reinforcement Learning - Aidan Scannell, Arno Solin and Joni Pajarinen [click for abstract]

Model-based reinforcement learning (MBRL) algorithms are more sample-efficient than their model-free counterparts. However, MBRL algorithms often fail, or perform poorly, due to their decision-making strategy (e.g. planning or policy optimization) exploiting inaccuracies in the learned dynamics model. These inaccuracies arise because the dynamics model is learned from a state transition data set representing only a small subset of the environment. As such, the dynamics model cannot be confident in making predictions far away from these state transitions. This concept is known as epistemic uncertainty. In the limit of infinite data, i.e. a data set containing all possible state transitions for an environment, the dynamics model's epistemic uncertainty is reduced. However, in MBRL an agent should only visit regions of the state-action space which lead to high rewards. As a result, an agent's dynamics model will always be subject to epistemic uncertainty. If the dynamics model knows what it does not know, this information can be used to prevent the decision-making strategy from exploiting the model's inaccuracies. Based on this intuition, recent algorithms leverage ensembles of neural networks to quantify the epistemic uncertainty associated with learning dynamics models from observations, i.e. to quantify what it does not know. In this work, we are interested in learning dynamics models using Bayesian neural networks and comparing them to ensemble methods. In particular, we seek to compare different approximate inference techniques (e.g. Laplace approximation, MC dropout), as well as ensemble methods, to understand why they either succeed or fail in different environments.

How to run a world record? A Reinforcement Learning approach - Shahsavari, Sajad (University of Turku); Immonen, Eero (Turku University of Applied Sciences); Karami, Masoomeh (University of Turku); Haghbayan, Hashem (University of Turku); Plosila, Juha (University of Turku) [click for abstract]

Finding the optimal distribution of exerted effort by an athlete in competitive sports has been widely investigated in the fields of sport science, applied mathematics and optimal control. In this article, we propose a reinforcement learning-based solution to the optimal control problem in the running race application. Well-known mathematical model of Keller is used for numerically simulating the dynamics in runner's energy storage and motion. A feed-forward neural network is employed as the probabilistic controller model in continuous action space which transforms the current state (position, velocity and available energy) of the runner to the predicted optimal propulsive force that the runner should apply in the next time step. A logarithmic barrier reward function is designed to evaluate performance of simulated races as a continuous smooth function of runner's position and time. The neural network parameters, then, are identified by maximizing the expected reward using on-policy actor-critic policy-gradient RL algorithm. We trained the controller model for three race lengths: 400, 1500 and 10000 meters and found the force and velocity profiles that produce a near-optimal solution for the runner's problem. Results conform with Keller's theoretical findings with relative percent error of 0.59% and are comparable to real world records with relative percent error of 2.38%, while the same error for Keller's findings is 2.82%.

Annual LiDAR back-scattering intensity dynamics of boreal tree species from high spatial and temporal resolution time series - Shcherbacheva, Anna (Department of Photogrammetry and Remote Sensing, Finnish Geospatial Research Institute - National Land Survey of Finland); Campos, Mariana Batista (Department of Photogrammetry and Remote Sensing, Finnish Geospatial Research Institute - National Land Survey of Finland); Liang, Xinlian (Department of Photogrammetry and Remote Sensing, Finnish Geospatial Research Institute - National Land Survey of Finland; The State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing, Wuhan University); Kukko, Antero (Department of Photogrammetry and Remote Sensing, Finnish Geospatial Research Institute - National Land Survey of Finland); Hyyppä, Juha (Department of Photogrammetry and Remote Sensing, Finnish Geospatial Research Institute - National Land Survey of Finland); Junttila, Samuli (Department of Forest Sciences, University of Helsinki; School of Forest Sciences, University of Eastern Finland); Lintunen, Anna (Department of Forest Sciences, University of Helsinki; Institute for Atmospheric and Earth System Research, University of Helsinki); Korpela, Ilkka (Department of Forest Sciences, Faculty of Agriculture and Forestry, University of Helsinki); Puttonen, Eetu (Department of Photogrammetry and Remote Sensing, Finnish Geospatial Research Institute - National Land Survey of Finland); Wang, Yunsheng (Department of Photogrammetry and Remote Sensing, Finnish Geospatial Research Institute - National Land Survey of Finland) [click for abstract]

This work demonstrated a method for classifying three boreal tree species by employing annual LiDAR backscattering intensity dynamics that was observed from a unique long-term hyper-temporal single wavelength laser scanning observation station as input for the classification algorithm. Hyper-temporal observations were acquired using a single wavelength (1,550 nm) static laser scanner by scanning the study area at the Hyytiälä forest research station in southern Finland (76 scans from April 2020 to April 2021). The sensor calibrated backscattering intensity values of the LiDAR point cloud exhibit annual variations with the patterns consistently uniform among trees from the same species, but diverse between representatives of different species. Based on that observation, we can compare the temporal signal of each tree with the species-specific temporal patterns in order to build classification model on the basis of well-established techniques, such as spectral angle mapper and efficient classification algorithms. Several effective supervised machine learning methods — Random Forest, ExtraTrees, XGBoost, Support Vector Machines, and Multi-layer Perceptron classifiers, were tested for the classification of boreal tree species. It was found that the classification attained an average accuracy of 96.8% when complete spatial (up to 100,000 points per square meter) and bi-weekly temporal resolution (76 days over the year, 2 scans collected weekly) were utilized. Our tests showed that the average classification accuracy was higher than 85% while using only two optimally chosen dates each year and the full spatial resolution. Furthermore, the average accuracy was close to 80% even with a spatial resolution of 5 pts/m2 and two optimally chosen dates.

Mining port operation information from AIS data - Steenari, Jussi (UH); Lwakatare, Lucy Ellen (UH); Talonen, Jaakko (NAPA); Manderbacka, Teemu (VTT); Nurminen, Jukka K (UH) [click for abstract]

Purpose: Ports play a vital role in global trade and commerce. While there is an abundance of analytical studies related to ship operations, less work is available about port operations and infrastructure. Information about them can be complicated and expensive to acquire, especially when done manually. We use a machine learning analytical approach on Automatic Identification System (AIS) data to understand how ports operate.

Methodology: This paper uses the DBSCAN algorithm on AIS data gathered near the Port of Brest in France to detect clusters representing the port's mooring areas. In addition, exploratory data analyses are performed on these clusters to gain additional insights into the port infrastructure and operations.

Findings: From Port of Brest, our experiment results identified seven clusters that had defining characteristics, which allowed them to be identified, for example, as dry docks. The clusters created by our approach appear to be situated in the correct places in the port area when inspected visually.

Originality: This paper presents a novel approach to detecting potential mooring areas and how to analyse characteristics of the mooring areas. Similar clustering methods have been used to detect anchoring spots, but this study provides a new approach to getting information on the clusters.

Age Against the Machine: A Call for Designing Ethical AI for and with Children - Sumita Sharma, INTERACT Research Unit, University of Oulu; Marianne Kinnula, INTERACT Research Unit, University of Oulu; Netta Iivari, INTERACT Research Unit, University of Oulu; Leena Ventä-Olkkonen, INTERACT Research Unit, University of Oulu; Heidi Hartikainen, INTERACT Research Unit, University of Oulu; Eva Durall Gazulla, INTERACT Research Unit, University of Oulu; Tonja Molin-Juustila, INTERACT Research Unit, University of Oulu; Jussi Okkonen, Tampere University; Sirkku Kotilainen, Tampere University; Nitin Sawhney, Department of Computer Science, Aalto University; Grace Eden, Department of Human-Centred Design, Indraprastha Institute of Information Technology - Delhi; Charu Monga, Department of Design, Indian Institute of Technology - Delhi [click for abstract]

Child-Computer Interaction (CCI) research is focused on cultivating, nurturing, and nudging children towards technology use and design. Recently, ethical aspects related to technology have come to the forefront, including the inherent limitations of technology, particularly related to Artificial intelligence (AI). Further, AI has a known diversity problem where age-inclusion can be sometimes forgotten. While various global and national policy frameworks on Children and AI are being developed, the approaches are child-centered but not child-led, restricting children from affecting their own digital futures. Further still, there is little discussion with children on the limitations, inherent biases, and lack of diversity in current design and development of AI. As AI evolves to mimic human-like cognition, emotions, conversations, and decision-making, its impact on children and their futures should be critically examined for, with, and by children.

To this end, we have a hands-on workshop at the NordiCHI conference this year, where we have invited researchers and their children to critically examine the challenges towards AI, where all participants consider and reimagine alternative technology presents and futures. The workshop is planned for Sunday Oct 9th (https://interact.oulu.fi/aatm), and its outcomes contribute to the ongoing work on Children and AI by including children as equal partners and empowering them to consider present and future challenges as experts of their own lives, with diverse interests, backgrounds, perspectives, and experiences. In AI Day 2022, we will present the outcomes of the workshop and discuss how researchers are inquiring, addressing, provoking, and instigating change in how children view their interactions with intelligent systems, and how to nurture them to develop a critical mindset towards technology - now and in the future.

Transient Modelling of Induction Machine Using Artificial Neural Networks - Tahkola, Mikko (VTT Technical Research Centre of Finland Ltd); Mukherjee, Victor (Technology Center, ABB Motors and Generators); Keränen, Janne (VTT Technical Research Centre of Finland Ltd) [click for abstract]

A start-up transient model of an induction machine (IM) is developed using an artificial neural network. The model is suitable for direct-on-line and converter fed induction machines. Different inputs and model configurations are investigated to find an optimal solution in developing the transient model. The datasets required by the development process are generated with a finite element-based model of induction machine. The transient model can be used to estimate the current and torque at any given time accurately in real time, which makes it suitable to use in digital twin services.

Spoken Conversational Context Improves Query Auto-Completion in Web Search - Tung Vuong, Salvatore Andolina, Giulio Jacucci and Tuukka Ruotsalo [click for abstract]

Web searches often originate from conversations in which people engage before they perform a search. Therefore, conversations can be a valuable source of context with which to support the search process. We investigate whether spoken input from conversations can be used as a context to improve query auto-completion. We model the temporal dynamics of the spoken conversational context preceding queries and use these models to re-rank the query auto-completion suggestions. Data were collected from a controlled experiment and comprised conversations among twelve participant pairs conversing about movies or traveling. Search query logs during the conversations were recorded and temporally associated with the conversations. We compared the effects of spoken conversational input in four conditions: a control condition without contextualization; an experimental condition with the model using search query logs; an experimental condition with the model using spoken conversational input; and an experimental condition with the model using both search query logs and spoken conversational input. We show the advantage of combining the spoken conversational context with the Web search context for improved retrieval performance. Our results suggest that spoken conversations provide a rich context for supporting information searches beyond current user-modeling approaches.

Automated defect detection in digital radiography of aerospace welds using deep learning - Tyystjärvi, Topias (Trueflaw; Department of Mechanical Engineering, School of Engineering, Aalto University); Virkkunen, Iikka (Trueflaw); Fridolf, Peter (GKN Aerospace Engine Systems); Rosell, Anders (GKN Aerospace Engine Systems); Barsoum, Zuheir (Department of Engineering Mechanics, KTH Royal Institute of Technology) [click for abstract]

Aerospace welds are non-destructively evaluated (NDE) during manufacturing to identify defective parts that may pose structural risks, often using digital radiography. The analysis of these digital radiographs is time consuming and costly. Attempts to automate the analysis using conventional computer vision methods or shallow machine learning have not, thus far, provided performance equivalent to human inspectors due to the high reliability requirements and low contrast to noise ratio of the defects. Modern approaches based on deep learning have made considerable progress towards reliable automated analysis. However, limited data sets render current machine learning solutions insufficient for industrial use. Moreover, industrial acceptance would require performance demonstration using standard metrics in non-destructive evaluation, such as probability of detection (POD), which are not commonly used in previous studies. In this study, data augmentation with virtual flaws was used to overcome data scarcity, and compared with conventional data augmentation. A semantic segmentation network was trained to find defects from computed radiography data of aerospace welds. Standard evaluation metrics in non-destructive testing were adopted for the comparison. Finally, the network was deployed as an inspector’s aid in a realistic environment to predict flaws from production radiographs. The network achieved high detection reliability and defect sizing performance, and an acceptable false call rate. Virtual flaw augmentation was found to significantly improve performance, especially for limited data set sizes, and for underrepresented flaw types even at large data sets. The deployed prototype was found to be easy to use indicating readiness for industry adoption.

PERCEPTUAL LOSS FUNCTION FOR NEURAL MODELLING OF AUDIO SYSTEMS - Wright, Alec (Aalto University), Välimäki, Vesa (Aalto University) [click for abstract]

This work investigates alternate pre-emphasis filters used as part of the loss function during neural network training for nonlinear audio processing. In our previous work, the error-to-signal ratio loss function was used during network training, with a first-order highpass pre-emphasis filter applied to both the target signal and neural network output. This work considers more perceptually relevant pre-emphasis filters, which include lowpass filtering at high frequencies. We conducted listening tests to determine whether they offer an improvement to the quality of a neural network model of a guitar tube amplifier. Listening test results indicate that the use of an A-weighting pre-emphasis filter offers the best improvement among the tested filters. The proposed perceptual loss function improves the sound quality of neural network models in audio processing without affecting the computational cost.

wav2vec2-based Speech Rating System for Children with Speech Sound Disorder - Yaroslav Getman Aalto University, Ragheb Al-Ghezi Aalto University, Ekaterina Voskoboinik Aalto University, Tamás Grósz Aalto University, Mikko Kurimo Aalto University, Giampiero Salvi Norwegian University of Science and Technology, Torbjørn Svendsen Norwegian University of Science and Technology, Sofia Strömbergsson Karolinska Institutet [click for abstract]

Speaking is a fundamental way of communication, developed at a young age. Unfortunately, some children with speech sound disorders struggle to acquire this skill, hindering their ability to communicate efficiently. Speech therapies, which could aid these children in speech acquisition, greatly rely on speech practice trials and accurate feedback about pronunciations. To enable home therapy and lessen the burden on speech-language pathologists, a highly accurate and automatic way of assessing the quality of speech uttered by young children is needed. Developing such systems is challenging, as most technologies were developed for adult voices, which differ significantly from the sounds of children. An additional challenging factor is data sparsity: in general, data collected from children is quite limited, which makes it extremely hard to use it for training a good Artificial Intelligence model. Our work focuses on exploring the applicability of state-of-the-art self-supervised, deep neural models called wav2vec2, for this task. These models are pre-trained on a large amount of adult speech, and their greatest appeal is that they require only a small amount of supervised, in-domain data for fine-tuning. We use wav2vec2 as a speech recognizer fine-tuned to work with children's speech and show that further fine-tuning with pronunciation rating tags could make them an excellent tool for speech therapy. The empirical results highlight that these self-supervised models are superior to traditional approaches and close the gap between machine and human performance.

Applying Answer Set Optimization to Preventive Maintenance Scheduling for Rotating Machinery - Yli-Jyrä, Anssi (Tampere University); Janhunen, Tomi (Tampere University) [click for abstract]

Preventive maintenance (PM) of manufacturing units aims at maintaining the operable condition of the production line while optimizing the maintenance timing and the loss of productivity during maintenance operations. The lesser studied type of preventive maintenance understands a production line as a single machine with multiple components of different maintenance needs. This is relevant when rotating machinery is deployed, e.g., in the paper and steel industries, in the mass production of raw materials consumed by other businesses. A failure in any stage of the production line has the potential of making the entire machine inoperable and enforcing a shutdown and corrective maintenance costs. This work gives an abstract formalization of PM scheduling for multi-component machines as an optimization problem. To provide a lower bound for the complexity of the optimization problem, we prove that the underlying decision problem is NP-complete for varying-size multi-component machines and scheduling timelines. Besides the formalization, the second main contribution of the paper is due to the practical need to solve the problem in industrial applications: the work gives the first encoding of the PM scheduling problem using Answer Set Optimization (ASO). Some preliminary experiments are conducted and reported to set the scene for further algorithm development.

Analyzing General-Purpose Deep-Learning Detection and Segmentation Models with Images from a Lidar as a Camera Sensor - Yu, Xianjia, University of Turku; Salimpour, Sahar, University of Turku; Peña Queralta, Jorge, University of Turku; Westerlund, Tomi, University of Turku [click for abstract]

Over the last decade, robotic perception algorithms have significantly benefited from the rapid advances in deep learning (DL). Indeed, a significant amount of the autonomy stack of different commercial and research platforms relies on DL for situational awareness, especially with vision sensors. This work explores the potential of general-purpose DL perception algorithms, specifically detection and segmentation neural networks, for processing image-like outputs of advanced lidar sensors. Rather than processing the three-dimensional point cloud data, this is, to the best of our knowledge, the first work to focus on low-resolution images with 360 degree field of view obtained with lidar sensors by encoding either depth, reflectivity, or near-infrared light in the image pixels. We show that with adequate preprocessing, general-purpose DL models can process these images, opening the door to their usage in environmental conditions where vision sensors present inherent limitations. We provide both a qualitative and quantitative analysis of the performance of a variety of neural network architectures. We believe that using DL models built for visual cameras offers significant advantages due to the much wider availability and maturity compared to point cloud-based perception.

Towards Lifelong Federated Learning in Autonomous Mobile Robots with Continuous Sim-to-Real Transfer - Yu, Xianjia, University of Turku; Salimpour, Sahar, University of Turku; Peña Queralta, Jorge, University of Turku; Westerlund, Tomi, University of Turku [click for abstract]

Deep learning(DL) methods have revolutionized mobile robotics, from advanced perception models for enhanced situational awareness to novel control approaches through reinforcement learning. And autonomous robots are being increasingly deployed as part of connected fleets, with collaboration among robots becoming a more relevant factor. From the perspective of collaborative learning, federated learning (FL) enables continuous training of models in a distributed, privacy-preserving way.

This paper focuses on vision-based obstacle avoidance for mobile robot navigation. On this basis, we explore the potential of FL for distributed systems of mobile robots, enabling continuous learning via the engagement of robots in both simulated and real-world scenarios. To demonstrate the effectiveness of such an method, we deploy wheeled robots in different indoor environments , analyze the performance of a FL method, and compare it to a traditional centralized training process with a priori aggregated data. We show the benefits of collaborative learning across heterogeneous environments and the potential for sim-to-real knowledge transfer. Our results show the significant performance benefits of FL and sim-to-real transfer for vision-based navigation in addition to the inherent privacy-preserving nature of FL by keeping computation at the edge. This is, to the best of our knowledge, the first work to leverage FL for vision-based navigation that also tests results in real-world settings.

By extending the above works, we studied the performance of different image classifiers for FL, compared to centralized, cloud-based learning with a priori aggregated data and introduce an approach to continuous learning from mobile robots with extended sensor suites able to provide automatically labeled data while they are completing other tasks. We show that higher accuracies can be achieved by training the models in both simulation and reality, enabling continuous updates to deployed models.

Semantic segmentation for the analysis of creep voids in metallic materials - Zeb, Akhtar (VTT); Linnosmaa, Joonas (VTT); Pohja, Rami (VTT); Pakarinen, Janne (VTT) [click for abstract]

Semantic segmentation is the task of assigning a class or label to every pixel of the image. In addition to detecting objects, the semantic segmentation models also predict the shape, size, and location of each object in images. Deep learning-based segmentation has been used in challenging object detection tasks in several domains, such as autonomous vehicles, satellite images, and medical image diagnostics.

Within materials science, timely and reliable detection of creep voids in solid materials operating under high temperatures is vital for better life cycle management of valuable components. In this study, we present the application of a semantic image segmentation model for detecting creep voids in SEM images.

The semantic segmentation models generally consist of an encoder network followed by a decoder network. The encoder is usually a pre-trained classification network, such as VGG or ResNet. The decoder network projects the discriminative features learned by the encoder onto the pixel space, performing the classification task. To distinguish the creep voids from the normal surface of copper samples, we applied the DeepLab-v3+ model with encoders pre-trained on large datasets. Training the model with only 250 images for 200 epochs, we obtained a mean IoU score of 0.994 and a dice loss of 0.003. The generated segmentation maps provide information about the area fraction and number of creep voids.