Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
146P - The prognosis value of heat-shock proteins in esophagogastric cancer: A systematic review and meta-analysis
Presenter: Eric Nakamura
Session: Cocktail & Poster Display session
Resources:
Abstract
148P - Identification of potential predictive biomarkers for ovarian cancer chemotherapy response
Presenter: Alsina Nurgalieva
Session: Cocktail & Poster Display session
Resources:
Abstract
149P - Rare RAS mutations are associated with recurrence patterns and recurrence-free survival in colon cancer: First results from Morocco
Presenter: Fatima Agy
Session: Cocktail & Poster Display session
Resources:
Abstract
151P - Development of a predictive model for response to neoadjuvant chemoradiation therapy of rectal cancer using the immunologic profile
Presenter: Eun Shin
Session: Cocktail & Poster Display session
Resources:
Abstract
152P - Biomarkers of neoadjuvant chemoradiotherapy response in locally advanced rectal cancer
Presenter: Cibele Masotti
Session: Cocktail & Poster Display session
Resources:
Abstract
153P - BRAF variants and therapy outcomes in melanoma
Presenter: Eftychia Chatziioannou
Session: Cocktail & Poster Display session
Resources:
Abstract
154P - The impact of proton pump inhibitors in the prognosis of patients with non-metastatic nasopharyngeal carcinoma
Presenter: João Barbosa Martins
Session: Cocktail & Poster Display session
Resources:
Abstract
155P - Use of machine learning for the identification of molecular biomarkers to predict response to neoadjuvant chemotherapy in locally advanced breast cancer patients
Presenter: María Del Río Pisula
Session: Cocktail & Poster Display session
Resources:
Abstract
156P - Molecularly driven therapy recommended by a molecular tumor board: Accessible option or privilege for a minority of patients? A single-center experience from the Czech Republic
Presenter: Michal Eid
Session: Cocktail & Poster Display session
Resources:
Abstract
157P - PCM4EU academy: An educational program for precision oncology
Presenter: Loic Verlingue
Session: Cocktail & Poster Display session
Resources:
Abstract