Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
30P - Role of microRNA and CDKN2A/p16INK4a expression in the prognostication of oral squamous cell carcinoma
Presenter: Olha Burtyn
Session: Cocktail & Poster Display session
Resources:
Abstract
31P - Identification of proteins associated with mRNA processing and maturation by quantitative proteomic analysis in Indian cervical cancer patients
Presenter: Amrita Mukherjee
Session: Cocktail & Poster Display session
Resources:
Abstract
32P - Expression of STAT3 and hypoxia markers in repeatedly resected glioma patients
Presenter: Katerina Dvorakova
Session: Cocktail & Poster Display session
Resources:
Abstract
33P - Unraveling the mechanisms of cisplatin resistance in bladder organoid by single cell RNA sequencing
Presenter: Tingting Xie
Session: Cocktail & Poster Display session
Resources:
Abstract
34P - Functional diagnostics and ex-vivo screening of erlotinib and nintedanib in non-small cell lung carcinoma: Implications for multidrug resistance and personalized therapy
Presenter: Jelena Dinić
Session: Cocktail & Poster Display session
Resources:
Abstract
35P - Enhancing efficacy of the MEK inhibitor trametinib in KRAS-mutated colorectal cancer cells
Presenter: Lee Ellis
Session: Cocktail & Poster Display session
Resources:
Abstract
36P - Comparison of pelitinib, tepotinib or docetaxel efficacy according to the copy number or gene alteration status of EGFR, MET, HRAS, KRAS and NRAS genes
Presenter: Dae Young Zang
Session: Cocktail & Poster Display session
Resources:
Abstract
37P - NET-mediated radio-resistance in early-stage non-small cell lung cancer
Presenter: Malcolm Ryan
Session: Cocktail & Poster Display session
Resources:
Abstract
39P - The use of antibiotics or proton pump inhibitors and the response to intravesical Bacillus Calmette Guérin therapy in non-muscle-invasive bladder cancer
Presenter: João Barbosa Martins
Session: Cocktail & Poster Display session
Resources:
Abstract
40P - YAP1 promotes sorafenib resistance by activation of TGFβ signaling pathway
Presenter: Chit Lai Chee
Session: Cocktail & Poster Display session
Resources:
Abstract