Oops, you're using an old version of your browser so some of the features on this page may not be displaying properly.

MINIMAL Requirements: Google Chrome 24+Mozilla Firefox 20+Internet Explorer 11Opera 15–18Apple Safari 7SeaMonkey 2.15-2.23

Cocktail and Poster Display session

2P - A machine learning-powered dashboard for the exploration of high-throughput transcriptomic datasets

Date

26 Feb 2024

Session

Cocktail and Poster Display session

Topics

Translational Research

Tumour Site

Presenters

Valentin Bernu

Citation

Annals of Oncology (2024) 9 (suppl_1): 1-2. 10.1016/esmoop/esmoop102255

Authors

V. Bernu1, C. Lescure1, H. Brull Corretger1, P. Dhillon1, E. Fox2, C. Marijon2, A. Nordor3, C. Petit1, A. Behdenna4

Author affiliations

  • 1 Data Science, Epigene Labs, 75017 - Paris/FR
  • 2 Translational Research, Epigene Labs, 75017 - Paris/FR
  • 3 Leadership, Epigene Labs, 75017 - Paris/FR
  • 4 Computational Research, Epigene Labs, 75017 - Paris/FR

Resources

This content is available to ESMO members and event participants.

Abstract 2P

Background

NCBI's Gene Expression Omnibus (GEO) is a major repository for high-throughput transcriptomic datasets. It currently contains approximately 7,000,000 transcriptomic profiles spread across more than 200,000 datasets, of which around 50,000 are related to cancer. The secondary analyses of these datasets hold vast potential to unlock new biological understanding and shape future clinical study designs. However, the high data heterogeneity and the limited browsing features of the repository’s website pose significant challenges, particularly in oncology research.

Methods

Here, we introduce a solution that leverages a tagging approach for the characterization of GEO datasets. It focuses on the clinical description (metadata) of the sample transcriptomic profiles included in the datasets, and detects multiple criteria (e.g., patient vs. cell line, donor type, overall survival, cancer type). This approach involves natural language processing techniques (e.g., named entity recognition and normalization), combined with machine learning classifications as well as rule-based classifications developed in collaboration with clinical and molecular oncology experts.

Results

The tagging models demonstrate high performance, with Area Under the Receiver Operating Characteristic Curve (AUC) values exceeding 0.95, 0.80, and 0.90 for identifying patient-derived samples, donor type, and overall survival information, respectively. Our cancer type classifier achieves a weighted average F1 score exceeding 0.90 across 21 cancer histologies. Ultimately, a user-friendly dashboard offers insights into GEO's cancer content, its evolution, and breakdowns by various features such as technology, platform, and cancer type.

Conclusions

Our tagging approach significantly enhances the exploration and annotation of GEO datasets, thus facilitating the secondary analysis of cancer-related data. This solution, coupled with dataset aggregation, is not only advantageous to deal with scarce data (e.g., in the context of rare cancers), but also scalable to abundant data sources.

Clinical trial identification

Editorial acknowledgement

Legal entity responsible for the study

Epigene Labs.

Funding

Epigene Labs.

Disclosure

V. Bernu, C. Lescure, H. Brull Corretger, E. Fox, C. Marijon, C. Petit, A. Behdenna: Financial Interests, Personal, Stocks/Shares, Stock options: Epigene Labs. P. Dhillon: Financial Interests, Personal, Invited Speaker, Stock options: Epigene Labs. A. Nordor: Financial Interests, Personal, Stocks/Shares, Stock options: Epigene Labs; Financial Interests, Personal, Stocks/Shares, Founder stocks: Epigene Labs.

This site uses cookies. Some of these cookies are essential, while others help us improve your experience by providing insights into how the site is being used.

For more detailed information on the cookies we use, please check our Privacy Policy.

Customise settings
  • Necessary cookies enable core functionality. The website cannot function properly without these cookies, and you can only disable them by changing your browser preferences.