Systematic review is an important element of medical research but rapid proliferation of published literature presents challenges to manual review. Computer science advances can improve workload by using algorithms to automatically select and extract data from articles. We initiated a systematic review of phase I immunotherapy clinical trials and used natural language processing to aid article screening.
A literature search was performed across MEDLINE, Embase and CENTRAL in September 2016 using 100+ search terms in the categories “neoplasm”, “immunotherapy” and “phase I clinical trial”. Only English language studies published since 1990 were included. We developed a web-based interface that allowed human reviewers to apply inclusion/exclusion labels based on title and abstract screening. Articles were screened by two independent reviewers who were blinded to results. An article similarity based algorithm using weighted logistic regression to predict “include” and “exclude” labels is being trained and herein we report interim results.
28,235 articles were identified from the literature search; 19,000 remained after duplicates and conference abstracts were excluded. 4,034 (21.2%) were screened, of which 532 (13.2%) were labeled “include” by at least one reviewer. 1,944 (10.2%) were screened by two reviewers with concordance of 93.7%. The prediction algorithm was weighted to improve the detection of “include” labels, and achieved 80.6% sensitivity and 78.2% specificity when compared to manual review results. The positive and negative predictive values were 34.4% and 96.6%, respectively.
A machine learning algorithm trained on manual reviews was able to predict systematic review article inclusion with approximately 80% accuracy. Algorithm performance was affected by the low rate of included articles, but irrelevant articles were able to be excluded with high confidence. Further development is ongoing to optimize the algorithm to improve sensitivity. Once optimized, this innovative machine learning process could transform the conduct of systematic reviews.
Clinical trial identification
Legal entity responsible for the study
All authors have declared no conflicts of interest.