Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
41P - HLA genotypes modify the age-related penetrance of BRCA1 pathogenic variants in breast cancer patients
Presenter: Ekaterina Kuligina
Session: Cocktail & Poster Display session
Resources:
Abstract
42P - Investigating the influence of extrachromosomal DNA in the progression of non-small cell lung cancer through the TRACERx and PEACE studies
Presenter: Jeanette Kittel
Session: Cocktail & Poster Display session
Resources:
Abstract
43P - MDM2 alterations in primary brain tumors: A potential niche for targeted therapy
Presenter: Diego Gomez Puerto
Session: Cocktail & Poster Display session
Resources:
Abstract
44P - Exploring miR-205 and miR-296 as salivary biomarkers and potential therapeutic targets in oral cancer
Presenter: Thaís Moré Milan
Session: Cocktail & Poster Display session
Resources:
Abstract
45P - Integrative analysis of TCGA DNA methylation, RNA-sequencing, and variant dataset using machine learning in predicting endometrial cancer recurrence
Presenter: Jinhwa Hong
Session: Cocktail & Poster Display session
Resources:
Abstract
46P - The role of microRNA-1246 in early detection of breast cancer: Findings from a systematic review and meta-analysis
Presenter: Muhammed Elfaituri
Session: Cocktail & Poster Display session
Resources:
Abstract
47P - Differential expression of discriminative markers in matched invasion fronts and tumour buds in CRC
Presenter: Laura Grech
Session: Cocktail & Poster Display session
Resources:
Abstract
48P - Transcriptome profiling highlights distinct gene signatures in HER2 high (HER2 3+) and HER 2 low (Her2 1+/2+) breast cancers
Presenter: Tamanna Thakur
Session: Cocktail & Poster Display session
Resources:
Abstract
49P - MiR-155 promotes breast cancer progression by upregulating cancer stemness
Presenter: Jeonghee Han
Session: Cocktail & Poster Display session
Resources:
Abstract
50P - Clinical impact of actionable molecular variants disclosed in late-stage cancer patients by tumor whole-exome sequencing in a prospective single-institution study
Presenter: Christophe Mapendano
Session: Cocktail & Poster Display session
Resources:
Abstract