Abstract 119P
Background
Fusion genes drive oncogenesis, and accurate pathogenicity assessment is crucial for therapeutic decisions. We have previously developed a system that leverages machine learning on known fusion genes to predict the pathogenicity of novel ones and explain its reasoning using large language model (LLM) (Cancers, accepted). This system achieved an accuracy of 0.98, comparable to the existing technologies, and provides explanations in natural language, a unique feature. However, these evaluations were performed on known data, leaving the accuracy and explanatory power for truly unknown fusion genes unclear. Evaluating explanatory power is challenging and requires multifaceted assessment.
Methods
This study focused on particular set of fusion genes with predictable mechanisms. We used BCR, involved in tumorigenesis, as the 5' partner, and selected 3' partners containing a kinase domain unreported with BCR. We input literature on BCR 5’ fusion genes into an LLM, and had it explain the mechanism of unreported BCR 5’ fusion genes with a 3’ kinase domain.
Results
Among the BCR fusion genes with unknown kinase partners, 30 were predicted as "Pathogenic” with high score. Among the 3' partner genes, receptor tyrosine kinases were the most frequent, followed by cell adhesion and extracellular matrix-related genes. The LLM generated domain-based hypotheses for the pathogenicity mechanisms of BCR fusion genes with unknown partners, considering the mechanisms of BCR::ABL1 and BCR::JAK2.
Conclusions
This study presents a novel approach for predicting and explaining the pathogenicity of unknown fusion genes, demonstrating its potential in certain cases. While the strength of evidence for the inferred mechanisms varied, this research suggests that more reliable explanations can be obtained for specific fusion genes, and further validation is needed for others. Experimental verification is necessary to confirm the validity of the hypotheses.
Clinical trial identification
Editorial acknowledgement
During the preparation of this work the author(s) used ChatGPT and Claude in order to suggest alternative phrasings, and improve clarity and readability. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the content of the publication.
Legal entity responsible for the study
Fujitsu Ltd.
Funding
Fujitsu Ltd.
Disclosure
K. Murakami, S. Tago, S. Takishita, H. Morikawa, R. Kojima, M. Fuji: Financial Interests, Personal, Full or part-time Employment: Fujitsu Ltd.. K. Yokoyama, M. Ogawa, H. Fukushima, H. Takamori, Y. Nannya, S. Imoto: Financial Interests, Institutional, Funding: Fujitsu Ltd.
Resources from the same session
183P - Development of a cadherin-17 (CDH17) immunohistochemistry assay for use as a companion diagnostic for cabotamig in gastrointestinal cancers
Presenter: Dennis Wong
Session: Poster session 08
184P - From breast and gastric to beyond: Expanding HER2 detection in solid tumors using quantitative RNA and protein analysis
Presenter: Kristian Egebjerg
Session: Poster session 08
185P - Multi-omics profiling and clinical characterization of colon-like cancer of unknown primary (CUP)
Presenter: Maria Pouyiourou
Session: Poster session 08
186P - Differences in antigen and immune marker expression in lymphoepithelioma-like carcinoma (LELC) and nasopharyngeal carcinoma (NPC): A multiplex immunohistochemistry (mIHC), spatial transcriptomic and multiplex immunofluorescence (mIF)-based analysis
Presenter: Daniel Peh
Session: Poster session 08
187P - Organoid growth-based oncological sensitivity test (OncoSensi) for predicting radiation therapy outcomes in pharyngeal and esophageal cancer
Presenter: Dong Woo Lee
Session: Poster session 08
188P - Integration of immunohistochemistry and transcriptomics reveals new insights into the immune landscape of soft-tissue sarcomas
Presenter: Giulia Petroni
Session: Poster session 08
189P - An image-based deep learning prediction model for characterization of the drug tolerant persister cell state
Presenter: Lauren Cech
Session: Poster session 08
190P - A large scale proteogenomics atlas for precision oncology research
Presenter: Timothy Anthony Yap
Session: Poster session 08
191P - Understanding and overcoming resistance to selective FGFR inhibitors across FGFR2-driven tumors
Presenter: Francesco Facchinetti
Session: Poster session 08
192P - Use of biosimulation to predict homologous recombination deficiency and PARPi benefit in patients with ovarian, pancreatic, prostate and triple negative breast cancers
Presenter: Daniel Palmer
Session: Poster session 08