Abstract 131P
Background
Colorectal cancer (CC) constitutes one of the most prevalent types of cancer, with extremely high mortality rate. Diagnosis of CC is characterized by low accuracy and high invasiveness. Thus, the development of accurate, noninvasive CC detection methods is a necessity, but also very challenging. Machine learning (ML) models can improve pathologists’ diagnostic accuracy, specificity, and sensitivity. This study presents preliminary results concerning ML classifiers discriminating between healthy and CC samples, using data of circulating tumor cells (CTCs) and their expression profile.
Methods
A dataset was generated based on 20 biomarkers including CTCs enumeration and protein expression (e.g. CD44, CD133, SOX2, OKT4, Nanog, MET, CD34, CD45, BCR-ABL, CD30, CD15, CD31, CD19, CD63, CD99, EpCam, MUC1, PSMA, PanCK). These biomarkers are commonly used in the identification of primary tumor in a patient and to provide guidance about disease progression and future prognosis. Particularly, for 35 healthy individuals and 39 CC patients, blood samples were analyzed to identify the presence, concentration and protein expression of CTCs. Then, the performance of 5 hyper-optimized ML classifiers was tested, namely for classification trees, Support Vector Machines (SVM), K-Nearest Neighbor (KNN), Ensemble classifiers and neural networks. The results correspond to a (5X-) 10-fold cross-validation estimation in a validation set (67 samples) and the resulting models were tested in a test set (7 samples).
Results
For all ML models and for the test sets, the mean accuracy was found to be 90.85 ± 2.39, the mean sensitivity (True Positive Rate) was found to be 96.25 ± 4.79, while the mean specificity (True Negative Rate) equal to 82.68 ± 3.65.
Conclusions
The present study reports preliminary results concerning the development of ML models which exhibit notable performance in distinguishing CC samples from healthy ones. These findings indicate that ML models, using CTCs’ enumeration and their protein expression data, could be included in clinical practice to assist pathologists in increasing the accuracy of diagnosis. However, albeit the results seem promising, more experiments are needed based on larger datasets to verify and extend the results of this study.
Legal entity responsible for the study
The authors.
Funding
Has not received any funding.
Disclosure
All authors have declared no conflicts of interest.