Abstract 95P
Background
Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.
Methods
The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.
Results
The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.
Conclusions
NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.
Editorial acknowledgement
Clinical trial identification
Legal entity responsible for the study
LynxCare Inc.
Funding
LynxCare Inc.
Disclosure
All authors have declared no conflicts of interest.
Resources from the same session
114P - Circulating microRNAs and response to oncological and surgical therapy in patients with locally advanced gastric cancer
Presenter: Vasileia Kokala-Dimitropoulou
Session: Cocktail & Poster Display session
Resources:
Abstract
115P - BrainStorm-NSE: Serum neuron-specific enolase as a biomarker for central nervous system metastases: A prospective cohort study
Presenter: Diogo Martins-Branco
Session: Cocktail & Poster Display session
Resources:
Abstract
116P - Switching to a multigenic parallel sequencing approach: The landscape of biomarkers profiling changing between immunohistochemistry and next generation sequencing advantages and sustainability from a public hospital in Northern Italy
Presenter: Giulia Ghirardi
Session: Cocktail & Poster Display session
Resources:
Abstract
117P - Homologous recombination deficiency (HRD) by shallow whole genome sequencing (sWGS): Seamless integration in an existing NGS somatic oncology workflow
Presenter: Etienne Muller
Session: Cocktail & Poster Display session
Resources:
Abstract
118P - Molecular diagnostics of gastrointestinal stromal tumors in the era of precision oncology
Presenter: Alena Kalfusova
Session: Cocktail & Poster Display session
Resources:
Abstract
119P - De novo and histologically transformed small-cell lung cancer is sensitive to lurbinectedin treatment through the modulation of EMT and NOTCH signaling pathways
Presenter: Triparna Sen
Session: Cocktail & Poster Display session
Resources:
Abstract
120P - Anti-angiogenic therapy or immunotherapy? A multicenter real-world study of patients with advanced non-small cell lung cancer with EGFR / HER2 exon 20 insertion mutations
Presenter: Tianqing Chu
Session: Cocktail & Poster Display session
Resources:
Abstract
121P - Clinical outcomes of compound EGFR mutation in non-small cell lung cancer: A national, retrospective, multicenter study
Presenter: Aurélien Brindel
Session: Cocktail & Poster Display session
Resources:
Abstract
122P - Molecular testing, treatment, and response of patients with advanced solid tumors harboring an NTRK gene fusion: Second interim results of the REALTRK registry
Presenter: Sebastian Lange
Session: Cocktail & Poster Display session
Resources:
Abstract
123P - Incidence and outcomes of anaplastic lymphoma kinase (ALK) and ROS 1 positive advanced NSCLC: A real world experience
Presenter: Stalin Chowdary Bala
Session: Cocktail & Poster Display session
Resources:
Abstract