Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
62P - Role of IL6 (C-174G) polymorphism in the development of cervical intraepithelial neoplasia
Presenter: Tatyana Abakumova
Session: Cocktail & Poster Display session
Resources:
Abstract
63P - The impact of disruption of melatonin secretion on the structural-functional changes of the microbiome and the role of the melatonin-microbiome axis in the initiation of carcinogenesis
Presenter: Alexandre Tavartkiladze
Session: Cocktail & Poster Display session
Resources:
Abstract
64P - Acidosis induces ferroptosis of breast cancer via ZFAND5/SLC3A2 axis with the synergistic effect of metformin and facilitates M1 macrophage polarization
Presenter: Hanchu Xiong
Session: Cocktail & Poster Display session
Resources:
Abstract
65P - Transmembrane distribution of phosphatidylethanolamine in plasma membrane of ovarian cancer cells under conditions mimicking tumor microenvironment
Presenter: Darya Savenkova
Session: Cocktail & Poster Display session
Resources:
Abstract
66P - Metabolic regulation of GMP- and MDP-derived macrophages in glioblastoma
Presenter: Liam Wilson
Session: Cocktail & Poster Display session
Resources:
Abstract
67P - Inflammation status and sarcopenia synergistically impact outcomes in cancer patients (pt) treated with ImmunOtherapy (IO) within the framework of a Molecular Pre-screening program (MP) and a spEcial Medication (ME) program
Presenter: Lucia Notario Rincon
Session: Cocktail & Poster Display session
Resources:
Abstract
68P - The role of systemic reprogramming of GMPs in improving outcomes in glioblastoma
Presenter: Aline Atallah
Session: Cocktail & Poster Display session
Resources:
Abstract
69P - Integrated OMIC analysis reveals arginine and proline metabolism plays critical role in hypoxia-induced oral squamous cell carcinoma
Presenter: Avinash Singh
Session: Cocktail & Poster Display session
Resources:
Abstract
70P - Individualising methotrexate dose based on MTHFR gene polymorphisms in acute lymphoblastic leukemia
Presenter: Meher Konatam
Session: Cocktail & Poster Display session
Resources:
Abstract
71P - Single nucleotide polymorphisms in the folate metabolic pathway genes and global DNA methylation in ovarian cancer
Presenter: Sandro Surmava
Session: Cocktail & Poster Display session
Resources:
Abstract