Abstract 1186P
Background
Early cancer screening using circulating tumor DNA (ctDNA) faces challenges due to low abundance and a high signal-to-noise ratio. We aimed to develop a robust screening model that overcomes these limitations.
Methods
Low-pass whole-genome bisulfite sequencing (Low pass-WGBS) was utilized with the high-efficiency WATCHMaker (7K0101-096) library preparation kit for the optimization of cell-free DNA (cfDNA) sample processing, with sample loss minimized and molecular conversion efficiency enhanced. Thirteen cancer-specific differentially methylated regions (DMRs), including those related to lung and liver cancers, were targeted in the analysis. The SmartCS-LPLLM model, a single-molecule multimodal early cancer screening model based on large language models, was developed. Cancer signals were precisely identified by this model through the analysis of cfDNA features, including methylation scoring, sequence length, terminal motif characteristics, and sequence linguistic features.
Results
Reanalysis of public data from BMC Medicine (CRA001537) demonstrated the SmartCS-LPLLM model's significant improvement in differentiating hepatocellular carcinoma (HCC) from non-HCC samples, with an increased AUC value of 0.967. In a blind test of 12 cfDNA samples, the model accurately classified all 5 liver cancer samples. Notably, the model has been enhanced to accurately identify ctDNA at a concentration as low as 0.05%. Furthermore, during the model's construction, it was observed that the highest accuracy was achieved when the DMR region was 120M, with the single-molecule read-level model achieving a 85% accuracy rate in distinguishing tumor from healthy reads.
Conclusions
The SmartCS-LPLLM model, integrating biological features like methylation and copy number variations (CNVs), provides a precise clinical strategy for early cancer screening. Its performance in blind tests confirms its robustness and suitability for identifying low-abundance ctDNA samples, indicating significant clinical relevance.
Clinical trial identification
Editorial acknowledgement
Legal entity responsible for the study
The authors.
Funding
Has not received any funding.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
1181P - Diagnostic target product profiles for cancer: A demand signaling tool to stimulate innovation in early cancer diagnosis
Presenter: Sonja Marjanovic
Session: Poster session 09
1182P - Determination of tumor PSMA expression in prostate cancer from blood using a novel epigenomic liquid biopsy platform
Presenter: Praful Ravi
Session: Poster session 09
1183P - Impact of multicancer early detection (MCED) test on participant-reported outcomes (PRO) and behavioral intentions by cancer risk
Presenter: Christina Dilaveri
Session: Poster session 09
1184P - Early real-world experience with positive multi-cancer early detection (MCED) test cases and negative initial diagnostic work-up
Presenter: Candace Westgate
Session: Poster session 09
1185P - Clinical applications of a novel blood-based fragmentomics assay for lung cancer detection
Presenter: Marc Siegel
Session: Poster session 09
1187P - Molecular diagnosis of lung cancer via ctDNA and ctRNA detection on bronchoscopic fluid specimens from 31 patients: A retrospective analysis
Presenter: Vincent Fallet
Session: Poster session 09
1188P - Modeled economic and clinical impact of a multi-cancer early detection (MCED) test in a population with hereditary cancer syndromes
Presenter: Sana Raoof
Session: Poster session 09
1189P - Cancer genome interpreter: A data-driven tool for tumor mutation interpretation
Presenter: Santiago Demajo
Session: Poster session 09
1190P - Circulating tumor DNA from the tumor-draining pulmonary vein as a biomarker in resected non-small cell lung cancer
Presenter: Raphael Werner
Session: Poster session 09
1191P - Efficient lung cancer stage prediction and outcome informatics with Bayesian deep learning and MCMC method
Presenter: Maria Gkotzamanidou
Session: Poster session 09