Abstract 1186P
Background
Early cancer screening using circulating tumor DNA (ctDNA) faces challenges due to low abundance and a high signal-to-noise ratio. We aimed to develop a robust screening model that overcomes these limitations.
Methods
Low-pass whole-genome bisulfite sequencing (Low pass-WGBS) was utilized with the high-efficiency WATCHMaker (7K0101-096) library preparation kit for the optimization of cell-free DNA (cfDNA) sample processing, with sample loss minimized and molecular conversion efficiency enhanced. Thirteen cancer-specific differentially methylated regions (DMRs), including those related to lung and liver cancers, were targeted in the analysis. The SmartCS-LPLLM model, a single-molecule multimodal early cancer screening model based on large language models, was developed. Cancer signals were precisely identified by this model through the analysis of cfDNA features, including methylation scoring, sequence length, terminal motif characteristics, and sequence linguistic features.
Results
Reanalysis of public data from BMC Medicine (CRA001537) demonstrated the SmartCS-LPLLM model's significant improvement in differentiating hepatocellular carcinoma (HCC) from non-HCC samples, with an increased AUC value of 0.967. In a blind test of 12 cfDNA samples, the model accurately classified all 5 liver cancer samples. Notably, the model has been enhanced to accurately identify ctDNA at a concentration as low as 0.05%. Furthermore, during the model's construction, it was observed that the highest accuracy was achieved when the DMR region was 120M, with the single-molecule read-level model achieving a 85% accuracy rate in distinguishing tumor from healthy reads.
Conclusions
The SmartCS-LPLLM model, integrating biological features like methylation and copy number variations (CNVs), provides a precise clinical strategy for early cancer screening. Its performance in blind tests confirms its robustness and suitability for identifying low-abundance ctDNA samples, indicating significant clinical relevance.
Clinical trial identification
Editorial acknowledgement
Legal entity responsible for the study
The authors.
Funding
Has not received any funding.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
212P - BRGSF-HIS mice as a predictive tool for safety assessment of biologics
Presenter: Kader Thiam
Session: Poster session 09
213P - Constructing a high-definition patient-digital twin (PDT) in treatment-naïve women with advanced cancer
Presenter: Leonardo Garma
Session: Poster session 09
215P - Detection of MUTYH for the prognosis and chemotherapy responsiveness of patients with non-small cell lung cancer
Presenter: Chi Wai Wong
Session: Poster session 09
216P - β-catenin is a potential prognostic biomarker in uterine sarcoma
Presenter: Ying Cai
Session: Poster session 09
218P - Exploiting a unique glycosaminoglycan for novel pan-cancer therapies and diagnostics
Presenter: Mette Agerbæk
Session: Poster session 09
219P - The landscape and prognostic impact of germline HLA-A subtypes in patients with advanced solid cancers
Presenter: Kyrillus Shohdy
Session: Poster session 09
220P - The role of fucosyltransferase 1 (FUT1) in CRC as a putative prognostic and predictive biomarker
Presenter: Lorenz Pammer
Session: Poster session 09
221P - ANGPTL4's role in cancer: A meta analysis and bioinformatics exploration
Presenter: Osama Younis
Session: Poster session 09
222P - Artificial intelligence (AI) based prognostication from baseline computed tomography (CT) scans in a phase III advanced non-small cell lung cancer (aNSCLC) trial
Presenter: Omar Khan
Session: Poster session 09