Oops, you're using an old version of your browser so some of the features on this page may not be displaying properly.

MINIMAL Requirements: Google Chrome 24+Mozilla Firefox 20+Internet Explorer 11Opera 15–18Apple Safari 7SeaMonkey 2.15-2.23

Cocktail & Poster Display session

95P - Automatic data processing to identify EGFR mutations in pathology reports of patients with non-small cell lung cancer (NSCLC)

Date

04 Oct 2023

Session

Cocktail & Poster Display session

Presenters

Betzabel Cajiao Garcia

Citation

Annals of Oncology (2023) 8 (suppl_1_S5): 1-55. 10.1016/esmoop/esmoop101646

Authors

B. Cajiao Garcia1, B. Koopman1, V. de Jager1, C.L. Oeste2, E. Schuuring3, A.J. van der Wekken4, S. Willems1, L. van Kempen5

Author affiliations

  • 1 Department Of Pathology And Medical Biology, UMCG - University Medical Center Groningen, 9713 GZ - Groningen/NL
  • 2 LynxCare Inc., 3000 - Leuven/BE
  • 3 Department Of Pathology And Medical Biology, UMCG - University Medical Center Groningen, 9700 RB - Groningen/NL
  • 4 Department Of Pulmonary Diseases And Medical Biology, UMCG - University Medical Center Groningen, 9700 RB - Groningen/NL
  • 5 Department Of Pathology, UZA - University Hospital Antwerp, 2650 - Edegem/BE

Resources

This content is available to ESMO members and event participants.

Abstract 95P

Background

Non-small cell lung cancer (NSCLC) is the most common subtype of lung cancer. Driver mutations in epidermal growth factor receptor (EGFR), which occur in ∼10-15% of NSCLC, can be targeted by specific therapies. Real-world data can provide valuable information regarding the prevalence of these mutations, including their subtypes. However, despite comprehensive data availability in the Dutch Pathology Registry (Palga), manual extraction of EGFR mutation status from narrative pathology reports is time-consuming. Therefore, we used machine learning and natural language processing (NLP) to identify pathology reports that state the presence of an EGFR mutation.

Methods

The NLP algorithm was trained and validated on manually curated datasets of semi-structured pathology reports from the Palga archive to generate a structured OMOP CDM database. Afterwards, pathology reports of patients with metastatic, non-squamous NSCLC in 2019-2020 were requested from the Palga registry. The output of the algorithm was compared to results of the manual extraction.

Results

The algorithm identified 839 (10.9%) reports that mention an EGFR alteration. Manual analysis indicated 875 reports, resulting in a data extraction accuracy of 95.9% (95% CI 92.7-99.2). The 36/875 (4.1%) reports that were not identified by the algorithm were all listed as variants of unknown significance (VUS) by the reader. In the EGFR-mutated patient groups, 73.0% (639/875) had a common EGFR mutation (i.e., exon 19 deletion (41.4%, 362/875) or p.(Leu858Arg) mutation (31.7%; 277/875)). Exon 20 insertions were detected in 8.1% (71/875) of patients. Automatic data processing was 48 times faster than complete manual extraction.

Conclusions

NLP algorithms allow rapid data extraction from pathology reports, thereby offering a time-efficient and cost-effective alternative to manual data processing. In turn, this approach enables rapid insight in current biomarker testing rates and prevalence of (actionable) mutations.

Editorial acknowledgement

Clinical trial identification

Legal entity responsible for the study

LynxCare Inc.

Funding

LynxCare Inc.

Disclosure

All authors have declared no conflicts of interest.

This site uses cookies. Some of these cookies are essential, while others help us improve your experience by providing insights into how the site is being used.

For more detailed information on the cookies we use, please check our Privacy Policy.

Customise settings
  • Necessary cookies enable core functionality. The website cannot function properly without these cookies, and you can only disable them by changing your browser preferences.