Oncologists use patients’ life expectancy to guide medical decisions and may benefit from a tool that provides accurate, unbiased assessments of prognosis. Existing prognostic models generally use only a few predictor variables. We used a large electronic medical record dataset to train a prognostic model for patients with metastatic cancer.
The model was trained and tested using data from 12,588 patients treated for metastatic cancer in the Stanford Health Care system from 2008-2017. Data sources included provider note text, labs, vital signs, procedures, medication orders, and diagnosis codes. Patients were divided randomly into training and test sets (80%/20% split). A regularized Cox proportional hazards model with 4,126 predictor variables was fit to the training set and evaluated on the test set. A landmarking approach was used due to the multiple observations per patient, with t0 set to the time of metastatic cancer diagnosis. Performance was also evaluated using 399 palliative radiation courses in test set patients. An existing published model that uses performance status, primary tumor site, and treated site was used as a baseline [Chow, JCO 2008:20;26(36)].
From the first visit after metastatic cancer diagnosis, median follow-up was 14.5 months and median overall survival was 20.9 months. Patients were seen for 384,402 daily visits. The prognostic model’s C-index for overall survival was 0.79 in the test set (averaged across landmark times). For palliative radiation courses, the C-index was 0.75 (95% CI 0.72-0.78), compared to 0.64 (95% CI 0.60-0.67) for an existing published model (p < 0.001).Table: 1512O
Predicted vs actual survival for 2,518 test set patients at landmark time t0 (first visit after diagnosis of metastatic cancer)
|Predicted median survival in months||Actual median survival in months (95% CI)|
|0-3 (n = 106)||1.3 (0.9-2.0)|
|3.1-6 (n = 172)||3.7 (2.5-5.2)|
|6.1-12 (n = 382)||6.7 (5.6-7.9)|
|>12 (n = 1858)||35.7 (31.8-39.1)|
The model showed high predictive performance, which was significantly better than that of an existing model. Because it is fully automated, the model can be used to examine providers’ practice patterns and could deployed in a decision support tool to help improve quality of care.
Clinical trial identification
Legal entity responsible for the study
National Institutes of Health.
M.F. Gensheimer: Research funding: Varian Medical Systems, Philips Healthcare. E. Cho: Employee of Genentech. D. Rubin: Research funding: Philips Healthcare. D.T. Chang: Research funding, honoraria: Varian Medical Systems; Stock ownership: ViewRay. All other authors have declared no conflicts of interest.