The use of cfDNA fragmentomic features has recently shown a strong surge in cancer early detection models. Although many individual studies have demonstrated successful applications in cancer detection, their generalizability remains unclear due to the lack of cross-study validations. A generalized model across studies will allow for robust diagnosis of high-risk individuals.
This study evaluated the window-level cfDNA size summary (WINDOW-FSS) feature from the commonly used method profiling the short (100 -150bp) and long (151-220 bp) cfDNA fragments and our in-house developed feature mapping chromosome arm-level fragment size distribution (ARM-FSD). The two features were analyzed uniformly in the lung cancer and pan-cancer models. The performance of the two models was also cross-study evaluated in two external cohorts. For pan-cancer, we built the models on an online pan-cancer dataset and assessed their performance using two independent cohorts across studies.
The lung cancer models implementing ARM-FSD and WINDOW-FSS reached the area under the curve (AUC) of 0.99 and 0.86 in our internal validation cohort. The ARM-FSD lung cancer model outperformed the WINDOW-FSS model by ∼10% when tested in two external cohorts (AUC: 0.97 vs 0.86; 0.87 vs 0.76). Dimension reduction of features showed that ARM-FSD will produce even better results when denoised using a non-linear algorithm, such as autoencoder, while performance of WINDOW-FSS derived models did not benefit from this. Using the online pan-cancer cohort for modeling, our ARM-FSD and WINDOW-FSS pan-cancer models achieved the AUCs of 0.91 and 0.93, respectively. The performance of the ARM-FSD pan-cancer model is consistently higher than the WINDOW-FSS model (0.88 vs 0.75, 0.98 vs 0.63) in two cross-study validation cohorts.
Our cross-study analysis revealed performance variation of models implementing different cfDNA fragmentomic features. The ARM-FSD-based models have consistently demonstrated higher generalizability and robustness in cohorts from diverse sources, highlighting the necessity of cross-study feature verification for future predictive model development.
Clinical trial identification
Legal entity responsible for the study
Jiangsu Province Health Planning Commission Medical Research Project; National Natural Science Foundation of China.
X. Fan, H. Bao, H. Tang, X. Wu, Y. Shao: Financial Interests, Personal, Full or part-time Employment: Nanjing Geneseeq Technology Inc. All other authors have declared no conflicts of interest.