New machine learning method offers better predictions of future disease risk

Predictions for developing common diseases were more accurate than before. Lue tämä suomeksi

Researchers at Aalto University have developed a new machine learning method that improves the estimation of risk of developing complex diseases such as heart disease, diabetes, and liver conditions. The new tool, called survivalFM, looks not just at individual risk factors, such as cholesterol levels or age, but also at how these factors interact with each other to affect long-term health outcomes.

‘Today’s health data is incredibly complex – and so is human health. Factors like age, lifestyle, and genetics rarely act alone; they also influence each other in subtle ways. We wanted to create a method that could capture some of these complex interdependencies and still be clear enough for researchers and clinicians to understand and use,’ says Heli Julkunen, the study’s lead author and a machine learning researcher at Aalto University.

What makes this approach novel is how it handles the interactions between risk factors. Examining every possible pair of interacting risk factors one by one would be computationally expensive. The new method uses a mathematical technique to efficiently capture the underlying patterns of interaction, even in very large datasets.

The results, published in Nature Communications on July 18, suggest that this approach could lead to more accurate and personalized predictions of who is likely to develop certain diseases in the future.

‘For instance, software that uses our method could help clinicians gain a better understanding of how combinations of risk factors, such as high cholesterol and smoking together, affect disease risk. This, however, is only a simplified example, since the true novelty of the method lies in its ability to examine the simultaneous effects of many such risk factors,’ Julkunen says.

Why this matters

Healthcare professionals use risk prediction models to estimate a person’s likelihood of developing a disease over time. These tools help guide decisions about prevention, screening, and treatment. For example, models like QRISK3 in the UK or FINRISKI in Finland are commonly used to estimate the future risk of cardiovascular disease in the next 10 years.

Traditional prediction models usually treat each risk factor on its own. But in reality, many factors affect each other. For instance, cholesterol levels can predict cardiovascular risk differently depending on age, genetics or lifestyle habits. By considering these interdependencies, the new machine learning method provides a more detailed picture of a person’s risk.

Tested with real-world health data

The researchers tested the new method using data from the UK Biobank, a large health research database that includes medical records, lab tests, lifestyle information, and genetic data from around 500,000 people.

The model was trained to predict the risk of developing ten common diseases over a ten-year period. Across most conditions, it outperformed standard prediction tools that treat risk factors independently. The new method was especially effective at improving individual risk estimates, in other words assigning higher predicted risks to those who actually went on to develop disease, and lower risks to those who remained healthy.

Designed to be interpretable

Unlike many machine learning and AI models that are difficult to interpret, this new method was designed to be transparent. That means researchers and users can understand how the model arrives at its predictions and see which combinations of risk factors influence the prediction.

‘We see an increasing interest for interpretability in machine learning and AI applications, particularly in sensitive areas like healthcare. This method allows us to look at the model and directly see why this person was flagged as high risk,’ says professor Juho Rousu from Aalto University and FCAI.

The method is broadly applicable to any outcome where timing matters. This includes not only medical research but also fields like engineering reliability studies and financial risk modelling.

The research was funded by the Research Council of Finland and the Technology Industries of Finland Centennial Foundation via the Aalto University House of AI centre.