Abstract 1860P
Background
Nodal metastatic spread of cancer is a sign of progression of the disease process. The use of artificial intelligence in clinical sciences has shown great promise in accurately predicting difficult-to-predict clinical outcomes. In our current study, we hypothesize that RNA transcription data can be used as a biomarker to predict nodal metastasis in a number of different cancers.
Methods
The Cancer Genome Atlas (TCGA) database was utilized to identify differentially expressed genes (DEGs) and corresponding clinicopathological characteristics for all types of cancers. In two experiments we used,1) complete gene expression data and 2) 199 selected genes involved in multiple cancer pathways to predict nodal metastasis All data were downloaded from the TCGA, compiled, coupled using key columns, next we developed a deep-learning algorithm that would predict nodal metastasis. For the data we applied a split of 80/20 for training and test sets, we used SMOTE to fix the uneven distribution of the outcome in the dataset. The accuracy of the models was assessed by measuring sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and area under the receiver-operator curve (AUROC).
Results
A total of 5507 RNASeq samples from 2869 patient data were analyzed in the TCGA database. Patient data was used after alignment for the development of a machine learning model. The model was a multi-layer deep-learning network constructed and trained. The complete gene expression profile produced a sensitivity of 59.3%, specificity of 64.9%, PPV of 63.9%, NPV of 60.3%, and an AUROC of 0.97(95%CI 0.03-0.05). While the selected gene expression panel showed a sensitivity of 92.6%, specificity of 92.7%, PPV of 92.5% and NPV of 92.8%, and AUROC of 0.98 (95%CI 0.05-0.09). Table: 1860P
Experiments | Sensitivity | Specificity | PPV | NPV | AUROC (95%CI) |
Complete gene expression | 59.3% | 64.9% | 63.9% | 60.3% | 0.97 (0.03-0.05) |
Targeted gene Expression data | 92.6% | 92.7% | 92.5% | 92.8% | 0.98 (0.05-0.09) |
Conclusions
The targeted gene deep learning (199 genes) model shows better performance than the complete gene expression model. Deep learning can predict nodal metastasis with high accuracy however, these results need further external validation.
Clinical trial identification
Editorial acknowledgement
Legal entity responsible for the study
The authors.
Funding
Has not received any funding.
Disclosure
F.B. Irfan: Financial Interests, Personal, Advisory Board, Advisory Board role at Health Practice, Silicon Valley Innovation Center, Infineon Technologies AG.: Infineon Technologies AG; Financial Interests, Institutional, Research Grant, Research grant of $150,000 from 2019-2021: NorthStar Anesthesia. All other authors have declared no conflicts of interest.