Abstract 1233P
Background
Circulating cell-free DNA (cfDNA) is a promising biomarker for early cancer detection, and its fragmentomics features have been successfully used to detect cancer signals in blood. However, its ability to predict the tissue of origin (TOO) of cancers remains to be evaluated, which is highly desirable to differentiate the most common types of gastrointestinal (GI) cancers, including colorectal (CC), esophageal (EC), gastric (GC), liver (LC), and pancreatic cancer (PC).
Methods
Whole-genome sequencing was performed for the cfDNA of 769 cancer patients (149 CCs, 137 ECs, 149 GCs, 272 LCs, and 62 PCs), to calculate the coverage at repetitive genomic regions (RepeatsCov), the depth and the cleavage diversity around transcription start sites (TSSDepth and TSSClvDiv), and the microbiome abundance (MicrobeAb). Together with other classical fragmentomics features, including copy number variation (CNV), end motif diversity (EDM), fragment size ratio (FSR), and promoter fragmentation entropy (PFE), a stacked ensemble machine learning classifier was trained and tested with sample ratio of 1:1 to predict the TOO of the GI cancers.
Results
The performance of each single feature was evaluated first, showing that the FSR model had the highest accuracy of 67.1% while the RepeatsCov model had the lowest of 53.9%. The ensemble of all the features resulted in an accuracy of 67.6%. Interestingly, a model combining MicrobeAb, RepeatsCov and FSR achieved the highest accuracy of 69.4% for all cancers (CC: 63.8%, EC&GC: 63.3%, LC: 83.6%, and PC: 43.8%), and an elevated accuracy of 87.8% to predict the top two most likely TOOs. We also trained and tested a previously reported multi-features-based model on our data, and our classifier achieved higher accuracy (69.4% vs. 60.6%).
Conclusions
We comprehensively evaluated the classical and our newly developed cfDNA fragmentomics features in predicting the TOO of cancer signals, and showed that by combining features including MicrobeAb, RepeatsCov and FSR, we were able to maximize the accuracy in predicting GI cancers’ TOO. However, results also indicate that features should be carefully selected to avoid multicollinearity or other negative effects.
Clinical trial identification
Editorial acknowledgement
Legal entity responsible for the study
The authors.
Funding
National Key Research and Development Program of China.
Disclosure
R. Fu, K. Xie, Y. Liu, H. Chen, M. Su, Q. He, Z. Su: Financial Interests, Personal, Full or part-time Employment: Singlera Genomics Inc. R. Liu: Financial Interests, Personal, Officer: Singlera Genomics Inc. All other authors have declared no conflicts of interest.
Resources from the same session
1241P - Decoding the glycan code: Pioneering early detection of non-small cell lung cancer through glycoproteomics
Presenter: Kai He
Session: Poster session 14
1242P - Implementing functional precision oncology in real-world patients: Translating extensive in vitro data into personalized treatment combining genetics and functional assays
Presenter: Dörthe Schaffrin-Nabe
Session: Poster session 14
1243P - Ocular surface squamous neoplasia early diagnosis using an AI-empowered autofluorescence multispectral imaging technique
Presenter: Abbas HABIBALAHI
Session: Poster session 14
1244P - AI-based accurate PD-L1 IHC assessment in non-small cell lung cancer across multiple sites and scanners
Presenter: Ramona Erber
Session: Poster session 14
1245P - A lymph nodal staging assessment model for various histologic types of small intestinal tumors
Presenter: YOUSHENG LI
Session: Poster session 14
1246P - Detection of alternative lengthening of telomeres (ALT) across cancer types based on tumor-normal multigene panel sequencing
Presenter: Juan Blanco Heredia
Session: Poster session 14
1247P - A detection model for EGFR mutations in lung adenocarcinoma patients based on volatile organic compounds
Presenter: Yunpeng Yang
Session: Poster session 14
1248P - Development of a high performance and noninvasive diagnostic model using blood cell-free microRNAs for multi-cancer early detection
Presenter: Jason Zhang
Session: Poster session 14
1249P - Whole genome sequencing-based cancer diagnostics in routine clinical practice: An interim analysis of two years of real-world data
Presenter: Jeffrey van Putten
Session: Poster session 14
1250P - Assessing lung carcinoma: A retrospective study on volume evaluation, consolidation and infiltration using chest OMX
Presenter: Swarnambiga Ayyachamy
Session: Poster session 14