Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
93P - A new platform for fast-track molecular stratification of endometrial carcinomas enabling timely treatment decisions in precision oncology
Presenter: Susanne Walz
Session: Cocktail & Poster Display session
Resources:
Abstract
94P - Harnessing circulating tumor DNA in bronchoalveolar lavage fluid for precise molecular diagnosis of NSCLC
Presenter: Frank Borm
Session: Cocktail & Poster Display session
Resources:
Abstract
96P - Analysis of concordance between microsatellite instability by next generation sequencing (NGS-MSI) and mismatch repair deficiency by immunohistochemistry (IHC-MMR) in endometrial cancer (EC) patients
Presenter: Simona Duranti
Session: Cocktail & Poster Display session
Resources:
Abstract
97P - Prospects of liquid biopsy in determining prognosis in children with HGG and DIPG
Presenter: Olga Regentova
Session: Cocktail & Poster Display session
Resources:
Abstract
98P - Liquid biopsy in NSCLC: A promising tool to predict immunotherapy response
Presenter: Ana Fernández
Session: Cocktail & Poster Display session
Resources:
Abstract
99P - Comprehensive genomic sequencing as an ancillary diagnostic tool for pathologists
Presenter: Dan Miller
Session: Cocktail & Poster Display session
Resources:
Abstract
100P - Standard serum biomarkers to help predict a cancer diagnosis in patients with non-specific symptoms: Data from Guy´s rapid diagnostic clinic
Presenter: Maria Monroy Iglesias
Session: Cocktail & Poster Display session
Resources:
Abstract
101P - Patient-derived organoids to optimize CDK4/6 inhibitor-based treatment selection in early breast cancer
Presenter: Carla Alves
Session: Cocktail & Poster Display session
Resources:
Abstract
102P - MicroRNAs in urine and saliva as non-invasive biomarkers of minimal residual disease in pediatric acute lymphoblastic leukemia
Presenter: Alejandra Pando-Caciano
Session: Cocktail & Poster Display session
Resources:
Abstract
103P - MSI detection by NGS using tumor samples and liquid biopsy for patients with solid tumors: A single institution experience
Presenter: Alexandra Lebedeva
Session: Cocktail & Poster Display session
Resources:
Abstract