Abstract 15P
Background
Breast cancer (BC) constitutes the most common female malignancy in the world and is the main cause of cancer-associated mortality. Personalized classification of histopathological subtypes is crucial to estimate a patient’s prognosis and administer appropriate treatment strategies. To date, the immunohistochemistry-based (IHC) classification scheme in use is still too broad therefore individuals of the same IHC tumor subtype may not benefit from the same treatment regimen. Indeed, characterizing the molecular profiles unique to each of the BC subtypes allows treatments to be tailored to these profiles in a more specific way than using a standalone IHC status.
Methods
To address this problem we have developed the SubType Classifier that combines whole genome sequencing (WGS) data and machine learning algorithms supporting molecular-based breast cancer subtype classification. For this purpose, we used WGS data of 1039 BC patients derived from 3 different databases (ICGC, TCGA, HMF), further divided into training (n=802) and hold-out (n=237) datasets. Training cohort has been labeled by the IHC-based subtype estimation, where HER2 status was corrected by the ERBB2 copy number variations (Wojtaszewska M. et al, 2021). The hold-out set was composed of samples with estrogen and/or progesterone receptors and HER2 status confirmed by IHC results.
Results
The CatBoost Classifier achieved the best performance on ∼1000 copy number variation (CNV)-based genomic features selected as the biologically most relevant. Validation on the hold-out dataset has provided >0.9 discriminative power by area under receiver operating characteristic (AUROC) curve analysis. Within the prediction of three TNBC, ER+HER2- and ER+HER2+ subtypes, the precision ranged from 0.69 to 0.90 and recall from 0.67 to 0.93. However, the classification of ER-HER2+ class remains a challenge due to the high power of ERBB2-correlated genes and high similarity to other ER+HER2+ class (0.3 precision and ∼0.2, recall).
Conclusions
Results obtained by the SubType Classifier confirmed that the use of comprehensive genomic data supports the IHC-based canonical breast cancer subtype classification, providing deeper insights about tumor biology.
Legal entity responsible for the study
MNM Diagnostics Sp. z o.o.
Funding
Has not received any funding.
Disclosure
M. Piernik, A. Wozna: Financial Interests, Personal, Stocks/Shares: MNM Diagnostics Sp. z o.o. P. Zawadzki: Financial Interests, Personal and Institutional, Ownership Interest: MNM Diagnostics Sp. z o.o. All other authors have declared no conflicts of interest.