6533b7d3fe1ef96bd1260a88

RESEARCH PRODUCT

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

Pablo Lorenzano MennaMartina BevilacquaMariane Gonçalves KulikAlexander Miguel MonzonLisanna PaladinJosé Luis LópezMartin Gonzalez BuitronJavier RiosMarco NecciSara ErrigoLayla HirshIvan MičetićJuliet F. NilssonAndrey V. KajavaMaría Silvina FornasariAntonio LagaresDamiano PiovesanSebastian Fernandez-albertiMaia Diana Eliana CabreraGustavo ParisiMaría Laura FabreMiguel A. Andrade-navarroSilvio C. E. Tosatto

subject

Repetitive Sequences Amino AcidAcademicSubjects/SCI00010BiologíaStatistics as TopicProtein Data Bank (RCSB PDB)Computational biologyBiologyRepetitive SequencesGene Ontology; HEK293 Cells; HeLa Cells; Humans; Proteins; Reproducibility of Results; Statistics as Topic; User-Computer Interface; Databases Protein; Repetitive Sequences Amino Acid; Tandem Repeat SequencesDatabases03 medical and health sciencesAnnotationUser-Computer InterfaceProtein structureSimilarity (network science)Tandem repeatGeneticsDatabase IssueHumansDatabases ProteinCiencias Exactasdatabase030304 developmental biology0303 health sciencesHierarchy (mathematics)Protein030302 biochemistry & molecular biologyProteinsReproducibility of Resultscomputer.file_formatProtein Data BankClass (biology)proteinsAmino AcidComputingMethodologies_PATTERNRECOGNITIONGene OntologyHEK293 CellsclassificationTandem Repeat Sequencesprotein tandem repeat structures[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]computerHeLa Cells

description

The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new levels (Clan > Family) requiring sequence similarity and describing repeat motifs in collaboration with Pfam. Data growth has been addressed with improved mechanisms for browsing the classification hierarchy. A new UniProt-centric view unifies the increasingly frequent annotation of structures from identical or similar sequences. This update of RepeatsDB aligns with our commitment to develop a resource that extracts, organizes and distributes specialized information on tandem repeat protein structures.

10.1093/nar/gkaa1097http://europepmc.org/articles/PMC7778985