6533b828fe1ef96bd12878e2

RESEARCH PRODUCT

tbg - a new file format for genomic data

Susanne GerberMarkus PfenningerTilman SchellPhilipp Schönnenbeck

subject

Computer scienceGenetic variationNucleic acid sequenceSingle-nucleotide polymorphismComputational biologyLine (text file)Python (programming language)File formatPeptide sequencecomputerToolboxcomputer.programming_language

description

AbstractMotivationThe question of determining whether a Single-Nucleotide Polymorphism (SNP) or a variant in general leads to a change in the amino acid sequence of a protein coding gene is often a laborious and time-consuming challenge. Here, we introduce the tbg file format for storing genomic data and tbg-tools, a user-friendly toolbox for the faster analysis of SNPs. The file format stores information for each nucleotide in each gene, allowing to predict which change in the amino acid sequence will be caused by a variant in the nucleotide sequence. Our new tool therefore has the potential to make biological sense of the unprecedented amount of genome-wide genetic variation that researchers currently face.ResultsThe new tab-separated file for storing the nucleotide data can be easily analyzed and used for a wide variety of biological research. It is also possible to automate some of these analyses using the additional analysis tools from tbg-toolsAvailabilitytbg-tools is written in Python and allows the installation from the command line. It can be found on https://github.com/Croxa/tbg-tools.Contactpschoenn@students.uni-mainz.de

https://doi.org/10.1101/2021.03.15.435393