Oops, you're using an old version of your browser so some of the features on this page may not be displaying properly.

MINIMAL Requirements: Google Chrome 24+Mozilla Firefox 20+Internet Explorer 11Opera 15–18Apple Safari 7SeaMonkey 2.15-2.23

E-Poster Display

33P - BLAST-guided mappability knowledgebase facilitates accurate detection of somatic variants

Date

17 Sep 2020

Session

E-Poster Display

Topics

Basic Science

Tumour Site

Presenters

shuang wang

Citation

Annals of Oncology (2020) 31 (suppl_4): S245-S259. 10.1016/annonc/annonc265

Authors

S. wang, C. Yan, F.Y. Yang

Author affiliations

  • Bioinformation, Genetron Health (Beijing) Technology, Co. Ltd, 102206 - Beijing/CN

Resources

Login to get immediate access to this content.

If you do not have an ESMO account, please create one for free.

Abstract 33P

Background

In NGS data analysis, when nucleotide polymorphisms (SNPs) exist within genomic region of low mappability, misalignments can raise alone with additional mismatches that may be identified as somatic variants. Due to the nature of SNPs between individual, no modern variant caller can algorithmically distinguish artifacts of such origin. A pre-indexed knowledgebase may help distinguishing such artifacts from real somatic variants.

Methods

The goal is to construct a knowledgebase of genomic regions that characterize as low mappable while highly polymorphic. We generated a synthetic data by dividing human reference genome into reads with length of 300bp and step of 75bp. BLAST was then used to search for region of similarity between FASTA and reference genome and preliminary inclusion criterion of similarity region was set. To validate artifacts of hypothesized origin, we generated another FASTA file by inserting SNPs of NA18595 from 1KGP into original FASTA file. The FASTA was aligned to reference genome so that the origin of mismatches can be explored.

Results

As expected, no mismatches were detected when synthetic data is free of SNPs. After germline variants were inserted, a total amount of 91 mismatches were identified at exome scale. All artifacts raised from reads that harbored SNPs and were misaligned to genomic regions of similar sequence context. Out of 91 artifacts, 36% occurred at very SNPs loci and 58% of them occurred at loci adjacent to SNPs. 94% artifacts were covered by our artifact knowledgebase. In addition, 59% of potential artifacts in our knowledgebase were reported in COSMIC. Although this only provided a rough estimation since we only selected artifact sites adjacent to polymorphic site of high population frequency (>5%), this high percentage implies the existence of artifacts in public cancer mutation knowledgebase.

Conclusions

Our analysis indicates that the difference between reference and individual can lead to misalignment especially when such genomic polymorphism occurs within low mappable regions. These misalignments may introduce false somatic variants. By constructing a BLAST-guided knowledgebase, we were able to faithfully detect artifact of such origin and achieve higher specificity of somatic variant detection.

Clinical trial identification

Editorial acknowledgement

Legal entity responsible for the study

Genetron Health (Beijing) Co. Ltd., 102206, Beijing, China.

Funding

Genetron Health (Beijing) Co. Ltd., 102206, Beijing, China.

Disclosure

All authors have declared no conflicts of interest.

This site uses cookies. Some of these cookies are essential, while others help us improve your experience by providing insights into how the site is being used.

For more detailed information on the cookies we use, please check our Privacy Policy.

Customise settings
  • Necessary cookies enable core functionality. The website cannot function properly without these cookies, and you can only disable them by changing your browser preferences.