Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
125P - Combination of navitoclax with alpelisib and trametinib to synergistically impair cell viability in high-grade ovarian cancer
Presenter: Lisa Wozelka-Oltjan
Session: Cocktail & Poster Display session
Resources:
Abstract
126P - Effect of sequential antitumoral treatment with immune checkpoint blockade and tyrosine kinase inhibitors in hepatocellular carcinoma
Presenter: Vincenza Ciaramella
Session: Cocktail & Poster Display session
Resources:
Abstract
127P - Novel bone-targeting of activatable sirolimus for targeted therapy of bone-resident cancers
Presenter: Alistare Sadra
Session: Cocktail & Poster Display session
Resources:
Abstract
128P - Network medicine approach identifies small molecule drugs as immune checkpoint inhibitors repurposable for rectal cancer
Presenter: Faheem Ahmed
Session: Cocktail & Poster Display session
Resources:
Abstract
129P - Repurposing existing therapies for adrenal cancer: Unlocking new possibilities
Presenter: Anupama Samantasinghar
Session: Cocktail & Poster Display session
Resources:
Abstract
130P - Restoration of the mutant p53 protein upon treatment with small molecule modulators
Presenter: Elvina Gilyazova
Session: Cocktail & Poster Display session
Resources:
Abstract
131P - Trop 2 and its overexpression in metastatic colorectal cancer patients (mCRCp): Biological, clinical and therapeutic implications
Presenter: Andrea Mancuso Petricca
Session: Cocktail & Poster Display session
Resources:
Abstract
132P - Novel small molecule modulators for activation of mutant tumor suppressor p53
Presenter: Damir Davletshin
Session: Cocktail & Poster Display session
Resources:
Abstract
133P - Cytotoxic efficacy of artificial vesicles obtained from CAR-T cells by ultrasonication
Presenter: Ekaterina Zmievskaya
Session: Cocktail & Poster Display session
Resources:
Abstract
134P - Doxorubicin and olaparib (OLA) synergism in high-grade serous ovarian (HGOC) and triple-negative breast cancer (TNBC) cell lines with olaparib-resistance
Presenter: Jose Alejandro Perez Fidalgo
Session: Cocktail & Poster Display session
Resources:
Abstract