More data are needed for monitoring the coronavirus epidemic

A feature for collecting anonymous statistics should be included in the coronavirus contact tracing application currently under development in Finland, as it would produce high-quality up-to-date data on the epidemic, writes Associate Professor Antti Honkela.

Photo by Susan

Photo by Susan Heikkinen / University of Helsinki

Along with many other countries, Finland is transitioning to a stage where restrictions imposed in order to stop the spread of the coronavirus are being lifted. While this is taking place, the spread of the virus is still being closely monitored.

The epidemic is monitored with the help of mathematical models that are fitted to the number of observed infections and hospitalisations. Since hospitalisation usually takes more than a week after infection, the monitoring lags behind the actual infection situation.

The coronavirus infection is primarily transmitted through a sufficiently long-lasting close contact between an infected person and another person susceptible to the infection. For this reason, the deployment of a mobile application for tracing such contacts is currently being prepared in Finland.

To ensure the privacy of users, the plan is for the app to only share the collected data when a user has been diagnosed with a confirmed coronavirus infection. This way, those potentially exposed to the virus would be notified of their exposure.

Not many people know that the contacts identified by the app also accumulate as statistics in the users’ devices, which would be a useful addition to monitoring and modelling the epidemic. Such statistics would offer up-to-date information on any changes to the number of contacts underlying cases of infection.

Data protection and privacy have been key considerations in contact tracing applications developed in Europe. Compiling statistics on the number of contacts is possible without significantly compromising these protections. Statistics could be collected with an additional and voluntary feature based on user consent.

How can sensitive data be collected while respecting user privacy? Since the 1960s, social scientists have applied a method where respondents are instructed randomly flip the response to a yes/no question with a known probability.

This method ensures that individual responses cannot be deduced with certainty – respondents can always claim that their answer was down to chance. Yet, when the responses of a sufficiently large group are combined, the effect of random responses can be excluded and, through statistical means, a reliable estimate of the share of various answers can be produced.

For the past 15 years, a corresponding principle of increased randomness has been intensively developed in computer science, under the name of differential privacy. In Finland, research in this field is conducted at the Finnish Center for Artificial Intelligence FCAI.

By employing this technique, researchers have developed various solutions that respect privacy. The level of privacy protection can be adjusted by tuning the level of randomisation, but the price of strong protection is less accurate results. Such techniques are already in everyday use in the mobile operating systems of Google and Apple, as well as in the decennial United States census, to be conducted again this year.

The contact tracing app could help in collecting interesting information, for example, on the number of users’ daily contacts. However, this cannot be done directly, as applications change the random identifiers assigned to users after at most 30 min, to protect user privacy.

This problem can be avoided by instead using, for example, the largest number of contacts accumulated in a short time window. Users can also be requested, on a voluntary basis, to provide additional information useful for epidemiological models, such as rough details of their age and region of residence.

One of the biggest worries pertaining to the contact tracing application is that for it to function effectively, most of the population has to use it. In this regard, the proposed statistics collection would be less demanding. It would already generate information relevant to epidemic modelling with a considerably smaller group of users, as not all contacts need to be traced and the share of missing contacts in the statistics can be easily corrected for.

Written by Antti Honkela. Honkela is an Associate Professor of Computer Science specialised in machine learning and artificial intelligence at the University of Helsinki. He heads the research on privacy-preserving AI at FCAI.