Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
20P - Effects of <italic>Apis dorsata</italic> honey on the expression of selected CYP450, pro-apoptotic, and anti-apoptotic genes during induced cytotoxicity in cyclophosphamide-treated human lung carcinoma (A549) cells
Presenter: Jose Kenneth Narag
Session: Cocktail & Poster Display session
Resources:
Abstract
21P - Hsa_circ_0009061 inhibits the progression of bladder cancer through the miR-889-3p/CPEB3 axis
Presenter: Minkang Wu
Session: Cocktail & Poster Display session
Resources:
Abstract
22P - Exploring exportin-1 as a therapeutic vulnerability in lung squamous cell carcinoma
Presenter: Vidushi Durani
Session: Cocktail & Poster Display session
Resources:
Abstract
23P - Identification of HPSE as potential novel therapeutic target for lung adenocarcinoma patients
Presenter: Samuel Doré
Session: Cocktail & Poster Display session
Resources:
Abstract
24P - High-throughput plasma proteomics profiling in early breast cancer
Presenter: Isabella Lombardo
Session: Cocktail & Poster Display session
Resources:
Abstract
25P - Immunohistochemical analysis of ROR1 and BMI-1 expression in luminal breast cancer
Presenter: Sergey Vtorushin
Session: Cocktail & Poster Display session
Resources:
Abstract
26P - Associations between cancer stem cells (CSC) markers and androgen (AR) and estrogen (ER) receptors expression in prostate cancer (PCa)
Presenter: Marina Puchinskaya
Session: Cocktail & Poster Display session
Resources:
Abstract
27P - Proteomic profiling reveals organ-specific differences in metastases and identifies potential biomarkers for recurrence risk in localized colon cancer
Presenter: Blanca García-Micó
Session: Cocktail & Poster Display session
Resources:
Abstract
28P - Collagen-activated signalling pathway is significantly hypermethylated in high-grade serous ovarian cancer (HGSOC) patients treated with platinum-containing neoadjuvant chemotherapy (NACT)
Presenter: Jose Alejandro Perez Fidalgo
Session: Cocktail & Poster Display session
Resources:
Abstract
29P - Quantitative tissue analysis reveal adenylate kinase 2 protein signatures: Therapeutic target for meningioma
Presenter: Rashmi Rana
Session: Cocktail & Poster Display session
Resources:
Abstract