New technology, including next-generation sequencing, has been contributing to discover novel genes and a genetic mechanism connected to oncology. Oncogenes are of particular interest to biologists, as they can provide a direct target for small molecule inhibitors. However, recent studies show that tumor suppressors and oncogenes are separable using rates of truncating mutations, mutation clustering, and copy number data. At the same time, existing literature suggested a higher intensity of purifying selection on cancer-related genes. This has led to hypothesize those oncogenes and tumor suppressor genes are more closely related physically, than noncancer-related genes. The aim of this study was to carry out clustering protein genes based on GO-terms and determine cluster structure and oncogene positions on it.
List of protein-coding genes was obtained with biomaRt Bioconductor package. ClusterProfiler Bioconductor package was used to get gene ontology data of the genes. The association of genes to cancer was calculated with OncoScore Bioconductor package. Data associated with biological processes, cellular components, and molecular functions were presented in a binary format that was used for clustering analysis. SeqSphere software used for generating minimum spanning trees (MST).
15521 out 19295 protein-coding genes (85.5%) had full information about biological processes, cellular components, and molecular functions. Among 15521 genes (1.5%), 226 genes were high associated with cancer (75 and higher oncoscore). 1441 protein-coding genes had a medium association with cancer (50-74 oncoscore). 5694 genes were between 21 and 49 oncoscores. 7345 genes were not oncogenes (based on oncoscore). In 815 cases it was not possible to determine the oncoscore. MST created on 206 unique GO terms (10% cutoff) of 15521 genes revealed grape-like structure with many clusters. High associated gene to cancer (oncogenes with high oncoscore) was distributed across different clusters and located on the outer layer of clusters.
Cluster analysis of protein-coding genes based on GO-terms (on biological process, molecular functions, and cell localization data) demonstrated cluster (grape-like) structure. Oncogenes were located mostly outside the cluster center.
Clinical trial identification
Legal entity responsible for the study
Dmitriy Babenko, Karaganda State Medical University.
Ministry of Education and Science of Republic of Kazakhstan.
All authors have declared no conflicts of interest.