Abstract 1233P
Background
Circulating cell-free DNA (cfDNA) is a promising biomarker for early cancer detection, and its fragmentomics features have been successfully used to detect cancer signals in blood. However, its ability to predict the tissue of origin (TOO) of cancers remains to be evaluated, which is highly desirable to differentiate the most common types of gastrointestinal (GI) cancers, including colorectal (CC), esophageal (EC), gastric (GC), liver (LC), and pancreatic cancer (PC).
Methods
Whole-genome sequencing was performed for the cfDNA of 769 cancer patients (149 CCs, 137 ECs, 149 GCs, 272 LCs, and 62 PCs), to calculate the coverage at repetitive genomic regions (RepeatsCov), the depth and the cleavage diversity around transcription start sites (TSSDepth and TSSClvDiv), and the microbiome abundance (MicrobeAb). Together with other classical fragmentomics features, including copy number variation (CNV), end motif diversity (EDM), fragment size ratio (FSR), and promoter fragmentation entropy (PFE), a stacked ensemble machine learning classifier was trained and tested with sample ratio of 1:1 to predict the TOO of the GI cancers.
Results
The performance of each single feature was evaluated first, showing that the FSR model had the highest accuracy of 67.1% while the RepeatsCov model had the lowest of 53.9%. The ensemble of all the features resulted in an accuracy of 67.6%. Interestingly, a model combining MicrobeAb, RepeatsCov and FSR achieved the highest accuracy of 69.4% for all cancers (CC: 63.8%, EC&GC: 63.3%, LC: 83.6%, and PC: 43.8%), and an elevated accuracy of 87.8% to predict the top two most likely TOOs. We also trained and tested a previously reported multi-features-based model on our data, and our classifier achieved higher accuracy (69.4% vs. 60.6%).
Conclusions
We comprehensively evaluated the classical and our newly developed cfDNA fragmentomics features in predicting the TOO of cancer signals, and showed that by combining features including MicrobeAb, RepeatsCov and FSR, we were able to maximize the accuracy in predicting GI cancers’ TOO. However, results also indicate that features should be carefully selected to avoid multicollinearity or other negative effects.
Clinical trial identification
Editorial acknowledgement
Legal entity responsible for the study
The authors.
Funding
National Key Research and Development Program of China.
Disclosure
R. Fu, K. Xie, Y. Liu, H. Chen, M. Su, Q. He, Z. Su: Financial Interests, Personal, Full or part-time Employment: Singlera Genomics Inc. R. Liu: Financial Interests, Personal, Officer: Singlera Genomics Inc. All other authors have declared no conflicts of interest.
Resources from the same session
1229P - Selective phenotypic and genotypic evaluation of circulating glial cells for improved diagnosis of glial malignancies
Presenter: Sewanti Limaye
Session: Poster session 14
1230P - hPG80 (circulating progastrin) is a new blood-based biomarker for diagnosis of early-stage non-small cell lung cancers
Presenter: Paul Hofman
Session: Poster session 14
1231P - Machine learning prediction of the case-fatality of COVID-19 and risk factors for adverse outcomes in patients with non-small cell lung cancer
Presenter: Yeji Jung
Session: Poster session 14
1232P - Analytic analysis of PanSeer7, a targeted bisulfite sequencing assay for blood-based multi-cancer detection for cancer early detection and tissue-of-origin identification
Presenter: Xinrong Yang
Session: Poster session 14
1234P - HistoMate: Automated preprocessing software for digital histopathology image to enhance deep learning
Presenter: Jinok Lee
Session: Poster session 14
1235P - Enrichment of rare cancers in pragmatic precision cancer medicine trial: Experience from IMPRESS-Norway
Presenter: Aaslaug Helland
Session: Poster session 14
1236P - Feasibility of online symptom monitoring to detect lung cancer relapse in Poland
Presenter: Ewa Pawlowska
Session: Poster session 14
1237P - Design and validation of a custom next-generation sequencing panel in melanoma, glioma and gastrointestinal stromal tumor
Presenter: Xiaoyan Zhou
Session: Poster session 14
1238P - Detecting driver mutations by AmoyDx 11-gene PCR with high concordance with next-generation sequencing in Chinese non-small cell lung cancer patients
Presenter: Dongmei Lin
Session: Poster session 14
1239P - NHS-Galleri trial enrolment approaches and participant sociodemographic diversity
Presenter: Charles Swanton
Session: Poster session 14