Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
136P - Molecular correlates of drug response to guide therapy in TNBC
Presenter: Nathan Merrill
Session: Cocktail & Poster Display session
Resources:
Abstract
137P - Event-free survival prediction using lncRNAs in pediatric B-cell acute lymphoblastic leukemia
Presenter: Unai Illarregi
Session: Cocktail & Poster Display session
Resources:
Abstract
138P - Exome sequencing analysis for the identification of actionable mutations related to neoadjuvant chemotherapy response in locally advanced breast cancer
Presenter: Ximena López
Session: Cocktail & Poster Display session
Resources:
Abstract
139P - Genomic prognostic and potential theragnostic factors in anal squamous cell carcinoma treated with abdominoperineal resection
Presenter: Abderaouf Hamza
Session: Cocktail & Poster Display session
Resources:
Abstract
144P - Correlation of EZH2 expression and response to chemoradiotherapy in patients with locally advanced inoperable oral cavity and oropharyngeal squamous cell cancers
Presenter: Soel Ahmed
Session: Cocktail & Poster Display session
Resources:
Abstract
140P - Predictive value of DNA damage response and repair gene alterations and neoscore for neoadjuvant immunotherapies in non-small cell lung cancer
Presenter: Fei Feng
Session: Cocktail & Poster Display session
Resources:
Abstract
141P - Immune-related epigenomic and transcriptomic signatures to predict immunotherapy response in NSCLC
Presenter: María Gallardo-Gómez
Session: Cocktail & Poster Display session
Resources:
Abstract
142P - Role and impact of hypoxia-inducible factor 1-alpha on survival rates in pancreatic cancer: A systematic review and meta-analysis
Presenter: Muhammed Elfaituri
Session: Cocktail & Poster Display session
Resources:
Abstract
143P - Genomic characterization and outcomes of patients with primary sclerosing cholangitis-related cholangiocarcinoma
Presenter: Jaime Haro Silerio
Session: Cocktail & Poster Display session
Resources:
Abstract
145P - Identification of next generation sequencing (NGS)-based genomic signature predicting resistance to immunotherapy (IO) in patients (pts) with metastatic non-small cell lung cancer (mNSCLC): A single-center cohort study
Presenter: Antonio Vitale
Session: Cocktail & Poster Display session
Resources:
Abstract