• Intelligence in Every Drop: How Machine Learning-Driven Liquid Biopsy is Shaping the Future of Multi-Cancer Early Detection
  • Alireza Aghaahmadi,1 Soheil Sardari,2,*
    1. Department of Advanced Technologies in Medicine, Faculty of Science, Tehran Medical Branch, Islamic Azad University, Tehran, Iran | Department of Artificial intelligence & Data Science, Pishtazteb Diagnostics, Tehran, Iran
    2. Department of Advanced Sciences &Technologies, Faculty of Science, Central Tehran Branch, Islamic Azad University, Tehran, Iran


  • Introduction: According to a report by the World Health Organization, in the year 2020, one in seven deaths was caused by cancer. Besides its impact on individuals' lives, cancer remains a significant economic burden globally. For cancer, early diagnosis is crucial, since reaching advanced stages of this disease reduces survival likelihood by more than 50%, despite receiving clinical therapies. Despite its cruciality, a 2023 report from the American Cancer Society stated that only 39% of cancers are diagnosed at an early stage. Hence, cancer metastasis remains the most major contributing factor for mortality in these patients. In spite of their advances, current methods of clinical detection have three prominent downfalls: being tailored for specific cancers, patient discomfort due to follow-ups and invasive methods, and, most importantly, questionable accuracy. Liquid biopsy, as a platform, has emerged in recent years to battle these downfalls in traditional diagnostic methods, embodying three main characteristics: ability to simultaneously detect multiple biomarkers in a single assay, non-invasiveness, and high accuracy. Multiple Cancer Early Detection (MCED) is a diagnostic approach that utilizes liquid biopsy through a single assay to detect multiple types of cancers. As the name implies, many of the biomarkers analyzed in this approach are released prior to the emergence of any tumor in the body. In this review, we aim to highlight AI’s utility in enhancing diagnostic accuracy and decreasing cancer mortality through analysis of the CCGA and PATHFINDER trials alongside the OncoSeek retrospective study. These studies were the main contributors to the validation of commercialized MCED tests available in Europe and North America.
  • Methods: Grail Galleri and OncoSeek were identified as commercially available MCED assays; subsequently, qualitative analyses of literature and trials relevant to their development and validation were executed. Through these analyses, key trends and use cases for AI models were pinpointed and categorized to synthesize a comprehensive overview of the models, their use cases, and emergent results.
  • Results: In the CCGA sub-study 1, WGS, WGBS, and targeted sequencing were used to process cfDNA data from 1628 cancer patients and 1172 healthy persons. For analysis, SVM models (methylation and SNV detection with optimal hyperplane), kernel logistic regression (modeling non-linear relationships of methylation), GBM (modeling CNAs with sequential trees), and XGBoost (combining eight features with hyperparameter tuning) were used. Finally, multinomial logistic regression was used to predict the origin of the cancer signal. Cross-validation showed a sensitivity of 34%, a specificity of 98%, and an accuracy of 75% for predicting tumor origin, although limitations remained due to variance in cfDNA extraction methods. In the PATHFINDER trial, 6,662 asymptomatic individuals over 50 years old (with or without additional cancer risk) were screened to evaluate an initial version of the Galleri test in a clinical setting. cfDNA was isolated from blood, and its methylation patterns were analyzed using the XGBoost model, which integrated methylation signals to predict cancer and pinpoint tumor origin. By selecting features and normalizing, accuracy improved. The PPV and NPV values were 38% and 98.6%, respectively, and TOO prediction accuracy was reported at 85%. The study's main limitation was its recruitment criteria. A retrospective analysis of data from routine clinical tests was used to create and validate the OncoSEEK AI algorithm. Seven tumor protein markers (TPMs) for various cancer types were analyzed utilizing the ECLIA method from peripheral blood samples. The model was trained with PTM inputs, age, and sex to predict the probability of cancer (POC) and tumor origin (TOO). For the POC, GLM and cross-validation were used, and for the TOO, RF and GBM were used. The sensitivity, specificity, and accuracy of TOO were 52%, 92%, and 67%, respectively. The main limitation is the focus on PTM and reduced generalizability.
  • Conclusion: Trials such as PATHFINDER, CCGA, and the OncoSeek study, utilized machine learning algorithms such as RF, GLM, XGBoost, SVM, and GBM to advance MCED as a diagnostic approach. These studies, through analysis of blood-derived biomarkers such as cfDNA methylation patterns and PTMs, showcased capability for detection of cancer and TOO prediction in early models. Their focus on at-risk individuals and groups with high adherence to cancer screening procedures advanced model functionality. However, pitfalls such as non-homogeneity of cfDNA data across studies (which hinders interoperability) and potentially biased recruitment criteria (due to focus on particular groups) and exclusivity to select biomarkers could potentially limit generalizability and accuracy for vast types of cancer. Our findings highlighted the potential of artificial intelligence as a tool to enhance MCED; despite advancements, optimization of data acquisition techniques and algorithms to enable usage of these tools for various types of cancers remains a prominent challenge in the field.
  • Keywords: Liquid Biopsy, Machine Learning, Multi-Cancer Early Detection, cfDNA, Artificial Intelligence