Machine learning model predicts liver cancer risk from routine blood tests and health records

Researchers have developed a machine learning model that predicts an individual's risk of hepatocellular carcinoma (HCC), the most common form of liver cancer, using routine clinical information already collected during standard medical visits.

The study, published in Cancer Discovery, found that the model significantly outperformed all publicly available risk scores on both internal and external validation datasets.

HCC is the fifth most common cancer globally and the third leading cause of cancer-related death. Current screening guidelines primarily target patients with confirmed liver cirrhosis, but the study's analysis of UK Biobank data revealed that 69% of the 538 HCC cases occurred in patients who had no prior diagnosis of cirrhosis, viral hepatitis, or other chronic liver disease. This means the majority of cases are developing in people who might not be flagged for screening under current cirrhosis-focused criteria.

The research team, led by Jan Clusmann at the Technical University of Dresden, Carolin Schneider at RWTH Aachen University, and Jakob Kather at TU Dresden, built their framework using data from two large population-based cohorts: the UK Biobank (over 500,000 individuals) for model development and the US-based All of Us Research Program (over 400,000 individuals) for external validation.

“Our study highlights the potential of a simple, easily utilized machine learning model to improve risk stratification for HCC using only routinely collected clinical data,” said Carolin Schneider.

How the model works

The team trained random forest classifiers on five types of clinical data, tested both independently and in stepwise combinations: demographics, electronic health records, blood test results, genomics, and metabolomics. Random forests aggregate multiple decision trees, each making simple binary decisions on patient variables, with the final prediction determined by the collective output.

The best-performing model combined demographics, electronic health records, and routine blood tests, achieving an area under the receiver operating characteristic curve (AUROC) of 0.88. Adding genomic or metabolomic data provided only marginal improvement.

The researchers then reduced the model's complexity through ablation experiments. A simplified version using just 15 routinely collected clinical features still outperformed all existing risk prediction scores, including FIB-4, APRI, NFS, and the aMAP score. The top features driving predictions included liver enzymes (AST, ALT), platelet count, diabetes status, waist circumference, age, and liver cirrhosis diagnosis.

Outperforming existing tools

The aMAP score, the best-performing existing tool, achieved an AUROC of 0.79 in the general population. The ML models achieved 3 to 10 times higher precision than existing linear risk scores, a critical metric for rare-event prediction where false positives carry significant clinical cost.

On precision-recall curves, which are particularly important for imbalanced datasets like cancer screening, existing scores performed poorly (area under the precision-recall curve of 0.00 to 0.02), while the ML models reached substantially higher values.

Using the model's three-tier risk classification system, over 70% of HCC cases were classified into the high-risk group in both the general population and the patients-at-risk subgroup. The high- and medium-risk groups together captured approximately 88% of cases.

External validation and generalizability

External validation on the All of Us cohort, which has substantially greater ethnic diversity than the UK Biobank, showed the model maintained comparable performance. Despite being trained predominantly on data from white participants (94% of the UK Biobank cohort), the model showed no significant performance gap between white and non-white subgroups in the more diverse All of Us population.

The model did show better performance for male than female patients, a finding the authors attribute to the higher prevalence of HCC in males and a potentially less prominent HCC phenotype in female patients. This performance gap was less pronounced in the All of Us cohort.

Practical deployment

The researchers have released all code and model weights openly and provide three deployment options: an interactive web calculator on Hugging Face for single-patient inference, a Python package for batch processing, and compatibility with agentic workflows through the model context protocol (MCP).

The study has several limitations. It relies on a retrospective design, includes a low fraction of patients with viral hepatitis (a major HCC risk factor globally), and has not yet been validated in Asian populations where viral hepatitis prevalence is higher.

The authors note that prospective clinical trials will be needed before the model can be recommended for clinical adoption.

The study was supported by German Cancer Aid, the German Federal Ministry of Research, Technology and Space, the German Research Foundation, and several other European and US funding bodies.

Machine learning model predicts liver cancer risk from routine blood tests and health records

How the model works

Outperforming existing tools

External validation and generalizability

Practical deployment

References

More in Research

Related stories

Google finds a way to shrink AI memory usage by 4.5x without losing accuracy

Moonshot AI proposes new method for how LLM layers share information, claims 1.25x compute advantage

DeepMind reports research-level results from Gemini Deep Think

Study Suggests LLM Leaderboards May Be More Fragile Than They Appear