RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

6533b7d3fe1ef96bd1260a88

RESEARCH PRODUCT

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

Pablo Lorenzano Menna Martina Bevilacqua Mariane Gonçalves Kulik Alexander Miguel Monzon Lisanna Paladin José Luis López Martin Gonzalez Buitron Javier Rios Marco Necci Sara Errigo Layla Hirsh Ivan Mičetić Juliet F. Nilsson Andrey V. Kajava María Silvina Fornasari Antonio Lagares Damiano Piovesan Sebastian Fernandez-alberti Maia Diana Eliana Cabrera Gustavo Parisi María Laura Fabre Miguel A. Andrade-navarro Silvio C. E. Tosatto

subject

Repetitive Sequences Amino Acid AcademicSubjects/SCI00010 Biología Statistics as Topic Protein Data Bank (RCSB PDB)Computational biology Biology Repetitive Sequences Gene Ontology; HEK293 Cells; HeLa Cells; Humans; Proteins; Reproducibility of Results; Statistics as Topic; User-Computer Interface; Databases Protein; Repetitive Sequences Amino Acid; Tandem Repeat Sequences Databases 03 medical and health sciences Annotation User-Computer Interface Protein structure Similarity (network science)Tandem repeat Genetics Database Issue Humans Databases Protein Ciencias Exactas database 030304 developmental biology 0303 health sciences Hierarchy (mathematics)Protein 030302 biochemistry & molecular biology Proteins Reproducibility of Results computer.file_format Protein Data Bank Class (biology)proteins Amino Acid ComputingMethodologies_PATTERNRECOGNITION Gene Ontology HEK293 Cells classification Tandem Repeat Sequences protein tandem repeat structures [INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]computer HeLa Cells

description

The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

year	journal	country	edition	language
2020-11-25	Nucleic Acids Research

10.1093/nar/gkaa1097 http://europepmc.org/articles/PMC7778985