Abstract 430P
Background
Real-World Data (RWD), sourced from electronic health records (EHRs), is largely unstructured text. Natural Language Processing (NLP) enables efficient extraction from clinical free text, offering scalable alternatives to manual methods. Advances in large language models (LLMs) enhance this process by structuring clinical data through zero-shot learning, making them valuable for analyzing oncologists’ letters and supporting clinical workflows. However, deploying LLMs in healthcare poses challenges, especially regarding data privacy and compliance with European Union laws that limit exporting personal health data. Running LLMs locally offers a privacy-preserving solution, but compliance and accuracy remain critical.
Methods
This retrospective real-world study analyzes a cohort of 186 EGFR mutated Non-Small Cell Lung Cancer (NSCLC) patients from APOLLO11 trial (NCT0550961), with clinical features reported in unstructured oncologists’ free texts. A manually annotated ground truth (GT) is available for comparison. The study evaluates the zero-shot capabilities of Llama 3.1 8B in extracting clinical information from medical letters. Using the LLM pipeline by Wiest et al., features were extracted by combining free text with an English or Italian prompt and grammar-based methods.
Results
Different prompting strategies were applied. The refined prompt demonstrated superior performance, achieving an average accuracy of 0.76 across critical features such as “SMOKE” (0.7), “BONE_METASTASIS” (0.89) and “BRAIN_METASTASIS” (0.8). The results remained consistent when changing token numerosity, highlighting the robustness of the prompt. Additional testing with English prompts confirmed the pipeline’s adaptability across languages while maintaining high accuracy.
Conclusions
This study demonstrates the effectiveness of a privacy-preserving LLM pipeline for extracting clinical features from free-text medical records in an oncology setting. By refining prompt design, the model achieved high accuracy, while adhering to European data privacy regulations. These findings enable scalable, multilingual applications of LLMs in healthcare.
Clinical trial identification
NCT05550961.
Legal entity responsible for the study
Fondazione IRCCS- Istituto Nazionale Tumori Milano.
Funding
Fondazione IRCCS Istituto Nazionale Tumori Milano.
Disclosure
L. Mazzeo: Financial Interests, Personal, Other, Lecture Fee: Novartis; Financial Interests, Personal, Other, Conference Grants: Sanofi, Daiichi Sankyo, LEO Pharma; Financial Interests, Personal, Other, Honoraria: MSD. V. Miskovic: Financial Interests, Personal, Other, Honoraria: Novartis. I. Wiest: Financial Interests, Personal, Other, Honoraria: AstraZeneca. M. Occhipinti: Financial Interests, Personal, Other, Honoraria, Consulting: AstraZeneca, BMS, MSD; Financial Interests, Personal, Other, Conference Grants: Eli Lilly. M. Brambilla: Financial Interests, Personal, Other, Travel Grant: Eli Lilly. T. Beninato: Financial Interests, Personal, Other, Conference Grants, Honoraria: MSD; Financial Interests, Personal, Other, Conference Grants: Sanofi, Pfizer, Eli Lilly. C. Proto: Financial Interests, Personal, Other, Conference Grants, Research Funding, Consulting: AstraZeneca, Roche, MSD, BMS; Financial Interests, Personal, Other, Research Funding, Consulting: Janssen; Financial Interests, Personal, Other, Research Funding: Pfizer, Celgene, Daiichi Sankyo. A. Pedrocchi: Financial Interests, Personal, Other, Honoraria: Novartis; Financial Interests, Personal, Stocks/Shares: Agade, AllyArm. G. Lo Russo: Financial Interests, Personal, Advisory Board: MSD, Novartis, AstraZeneca, Bms, Sanofi, Pfizer, Roche, Lilly, GSK, Daiichi Sankyo, Johnson & Johnson, Regeneron, Merck, Pierre Fabre; Financial Interests, Personal, Invited Speaker: Italfarmaco, Merck, BMS, Lilly, Sanofi; Financial Interests, Institutional, Other, Contribute For Meeting Organization: Janssen; Financial Interests, Institutional, Other, Contribute For Meeting Organization: Bayer; Financial Interests, Personal, Other, Travel Accommodation: Amgen, MSD; Financial Interests, Institutional, Other, Contribute to meeting organization: BeiGene; Financial Interests, Institutional, Invited Speaker: MSD, BMS, Roche, GSK, Celgene, Novartis, AstraZeneca, Amgen, Lilly. J.N.N. Kather: Financial Interests, Personal, Invited Speaker, Talk on 14 November 2022: Fresenius; Financial Interests, Personal, Advisory Board, Scientific Advisory Board since 2022: Owkin, DoMore Diagnostics, Panakeia, London, UK; Financial Interests, Personal, Invited Speaker, Talk on 4 July 2023: Bayer; Financial Interests, Personal, Invited Speaker, Talk on 1 July 2023: BMS; Financial Interests, Personal, Invited Speaker, Talk on 13 November 2024: Roche; Financial Interests, Personal, Invited Speaker, Invited talks on 21 October 2023 and 31 July 2024: Pfizer; Financial Interests, Personal, Other, Expert services to select activities of AstraZeneca, e.g. Advisory Board Participation, Invited Lectures at internal events and participation in technical discussion meetings, starting in March 2023: AstraZeneca; Financial Interests, Personal, Invited Speaker, Invited lecture on 25 July 2024: Daiichi Sankyo; Financial Interests, Personal, Other, Consultancy on 5 June 2024: Bioptimus; Financial Interests, Personal, Invited Speaker, Talk on 12 January 2024: Janssen; Financial Interests, Personal, Invited Speaker, Talk on 9 May 2023: Merck Sharp and Dohme; Financial Interests, Personal, Invited Speaker, Talk on 26 September 2024: Merck; Financial Interests, Personal, Other, Consultancy on 9 October 2023: Mindpeak; Financial Interests, Personal, Other, Consultancy since 2024: MultiplexDx; Financial Interests, Personal, Stocks/Shares, Shares and part-time activities in a company that provides artificial intelligence services for life science customers.: StratifAI GmbH; Financial Interests, Personal, Stocks/Shares, Advisory board membership and share ownership for Synagen GmbH (www.synagen.ai): Synagen GmbH; Financial Interests, Institutional, Invited Speaker, I am PI on a research project at University Hospital Heidelberg which was funded by GSK.: GSK. A. Prelaj: Financial Interests, Personal, Other, Training of personnel: AstraZeneca, Italfarma; Financial Interests, Personal, Invited Speaker, The Hive Project: Discussant: Roche; Financial Interests, Personal, Advisory Board, Advisory board in Lung Cancer project: BMS; Financial Interests, Personal, Other, Travel Grant: Janssen; Financial Interests, Personal, Advisory Board: Janssen, AstraZeneca; Financial Interests, Personal, Invited Speaker: MEDSIR, Novartis, Lilly; Financial Interests, Institutional, Invited Speaker: Bayer, BMS, AstraZeneca, Lilly, MSD, Spectrum, Roche. All other authors have declared no conflicts of interest.