Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
104P - Hypomethylated immune gene promoters as potential biomarkers in oral and oropharyngeal cancer
Presenter: Petra Anić
Session: Cocktail & Poster Display session
Resources:
Abstract
105P - Implementation of technical improvements in cfMeDIP-Seq library preparation
Presenter: Martina Dameri
Session: Cocktail & Poster Display session
Resources:
Abstract
106P - Clinical application of next-generation sequencing in metastatic colorectal cancer (mCRC): Experience from a comprehensive cancer centre
Presenter: David Lluís Garulo
Session: Cocktail & Poster Display session
Resources:
Abstract
107P - Unveiling mismatch repair deficiency (dMMR) and microsatellite-instability high (MSI-H) detection in cancer patients (pt) using a next-generation sequencing (NGS)-based molecular pre-screening program (MPP)
Presenter: Lucia Notario Rincon
Session: Cocktail & Poster Display session
Resources:
Abstract
108P - Validation and implementation of a large NGS panel to test liquid biopsies from patients with suspected advanced non-small cell lung cancer (NSCLC) in an NHS genomic laboratory for the QuicDNA biomarker study
Presenter: Rachel Dodds
Session: Cocktail & Poster Display session
Resources:
Abstract
109P - A multiomic, single-cell measurable residual disease (scMRD) assay for phasing DNA mutations and surface immunophenotypes
Presenter: Simone Formisano
Session: Cocktail & Poster Display session
Resources:
Abstract
110P - Multicellular three-dimensional tumor spheroid of nasopharyngeal carcinoma
Presenter: Shiau Chuen Cheah
Session: Cocktail & Poster Display session
Resources:
Abstract
111P - Development of digital PCR for accurate measurement of HER2 amplification in 184 gastric cancer patients
Presenter: So Young Kang
Session: Cocktail & Poster Display session
Resources:
Abstract
112P - A novel methylation-sensitive assay for early detection of hepatocellular carcinoma to improve surveillance
Presenter: Jeong Sil Ha
Session: Cocktail & Poster Display session
Resources:
Abstract
113P - Evaluation of effects of tissue preservation methods on the proteome abundance through deep proteomics of breast cancer tissue
Presenter: Shashwati parihari
Session: Cocktail & Poster Display session
Resources:
Abstract