Abstract 156P
Background
Lung Cancer (LC) diagnosis is highly complex due to non-specific initial symptoms and lack of routine screening. Machine Learning (ML) based cancer risk-assessment tools can ease earlier diagnosis by enhancing referrals for cancer investigations. We present an age and sex matched case-control study aimed at the development of a ML predictive model to identify individuals at high risk of LC, estimating a potential decrease of death risk by 10% per anticipated month.
Methods
Electronic Health Records from a digital cohort of 4332 citizens (722 LC cases & 3610 controls) ≥18 years old having a pathology-confirmed LC and assigned to the Department of Health Valencia La Fe were analysed to identify early risk factors. Initial variable selection was based on structured and semi-structured information, related to laboratory tests, cancer history, use of health-care resources, symptoms and smoking history. The final selection of variables and prediction time was determined by feature selection methods, clinical suitability of predictions and model performance. Four ML classifiers were used (table). The dataset was randomly split into a training (70%) and a test (30%) set. Fivefold cross-validation was used for model selection with the final performance evaluated on the unseen test set.
Table: 156PPerformances of ML classifiers on test set
Classifier | AUC | Sensitivity (%) | Specificity (%) |
---|---|---|---|
Logistic Regression | 0.79 | 77.9 | 68.6 |
Decision Tree | 0.77 | 77.9 | 68.1 |
Random Forest | 0.80 | 79.3 | 68.3 |
Neural Network | 0.79 | 80 | 68.3 |
Results
Using just nine input variables within 60 days prior to diagnosis [ALT, GPT and creatinine levels, platelets, lymphocytes and monocytes counts, smoking status, general malaise, prior emphysema history and number of outpatient visits in the previous year], all techniques displayed similar performances with areas under curves (AUCs).
Conclusions
The developed models could help to identify a greater number of patients for either initiate the diagnostic process or to establish a close monitoring at primary care level with a potential decrease of patients’ death risk by around 20%. However, additional clinical validation of models’ performance will be imperative to gauge usefulness in a real-world scenario.
Legal entity responsible for the study
AstraZeneca Farmacéutica Spain.
Funding
AstraZeneca Farmacéutica Spain, S.A.
Disclosure
All authors have declared no conflicts of interest.