Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
51P - Functional characterization of the novel long intergenic non-coding RNA-RFC4, a transcript regulating chromosomal instability in prostate cancer
Presenter: Rogelio Montiel Manríquez
Session: Cocktail & Poster Display session
Resources:
Abstract
52P - The concentration of mutated copies of driver genes in plasma closely mirrors the disease course in colorectal cancer, lung cancer, and melanoma patients
Presenter: Ekaterina Kuligina
Session: Cocktail & Poster Display session
Resources:
Abstract
53P - Heterogeneous characteristics of KRAS mutation subtypes in surgically resected lung adenocarcinomas
Presenter: Kazuya Takamochi
Session: Cocktail & Poster Display session
Resources:
Abstract
54P - ATRX-deficient IDH-wildtype adult high-grade gliomas display novel, clinically relevant genetic patterns by comprehensive genomic profiling
Presenter: Gábor Bedics
Session: Cocktail & Poster Display session
Resources:
Abstract
55P - EGFR variant allele frequency (VAF) impacts on metastatic NSCLC patients outcome during first-line osimertinib
Presenter: Silvia Teresa Riva
Session: Cocktail & Poster Display session
Resources:
Abstract
57P - Clinical characteristics and outcomes in non-small cell lung cancer (NSCLC) with tumour and germline BRCA1/2 mutations
Presenter: Greydon Arthur
Session: Cocktail & Poster Display session
Resources:
Abstract
58P - Molecular investigation using microarray-based comparative genomic hybridization in patients with myelodysplastic syndrome and normal karyotype
Presenter: Mohamed abd naceur AMMAR
Session: Cocktail & Poster Display session
Resources:
Abstract
59P - Unraveling methylation signatures in RAS/BRAF wild-type colorectal cancer patients to identify predictive biomarkers for anti-epidermal growth factor receptor therapy
Presenter: Ana Regina de Abreu
Session: Cocktail & Poster Display session
Resources:
Abstract
60P - Spindle cell sarcomas with tyrosine kinase rearrangement
Presenter: Lenka Krsková
Session: Cocktail & Poster Display session
Resources:
Abstract
61P - Deconvoluting the intra-tumour heterogeneity and subclonal evolution of CDK4/6 inhibitor resistance in ER+ breast cancer
Presenter: Ioanna Mavrommatis
Session: Cocktail & Poster Display session
Resources:
Abstract