0000000000236572

AUTHOR

Andrey V. Kajava

showing 4 related works from this author

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

2020

The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new lev…

Repetitive Sequences Amino AcidAcademicSubjects/SCI00010BiologíaStatistics as TopicProtein Data Bank (RCSB PDB)Computational biologyBiologyRepetitive SequencesGene Ontology; HEK293 Cells; HeLa Cells; Humans; Proteins; Reproducibility of Results; Statistics as Topic; User-Computer Interface; Databases Protein; Repetitive Sequences Amino Acid; Tandem Repeat SequencesDatabases03 medical and health sciencesAnnotationUser-Computer InterfaceProtein structureSimilarity (network science)Tandem repeatGeneticsDatabase IssueHumansDatabases ProteinCiencias Exactasdatabase030304 developmental biology0303 health sciencesHierarchy (mathematics)Protein030302 biochemistry & molecular biologyProteinsReproducibility of Resultscomputer.file_formatProtein Data BankClass (biology)proteinsAmino AcidComputingMethodologies_PATTERNRECOGNITIONGene OntologyHEK293 CellsclassificationTandem Repeat Sequencesprotein tandem repeat structures[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]computerHeLa CellsNucleic Acids Research
researchProduct

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

2019

AbstractThe widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotatio…

FOS: Computer and information sciencesBioinformatics[SDV]Life Sciences [q-bio]Sequence assemblyGenomics[SDV.BC]Life Sciences [q-bio]/Cellular BiologyComputational biologyBiologyGenome03 medical and health sciencesAnnotation0302 clinical medicineTandem repeatGeneticsAnimalsSurvey and SummaryDatabases ProteinGeneComputingMilieux_MISCELLANEOUS030304 developmental biology0303 health sciencesEnd user572: BiochemieDNASequence Analysis DNAGenomics[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]WorkflowComputingMethodologies_PATTERNRECOGNITIONGadus morhuaTandem Repeat SequencesScientific Experimental Error[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Databases Nucleic Acid030217 neurology & neurosurgery
researchProduct

RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures

2017

RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by a…

0301 basic medicineRepetitive Sequences Amino Acid[SDV.BC]Life Sciences [q-bio]/Cellular BiologyBiologyBioinformaticsSearch engineAnnotationStructure-Activity Relationship03 medical and health sciences0302 clinical medicineTandem repeatGeneticsAnimalsHumansDatabase IssueDatabases ProteinComputingMilieux_MISCELLANEOUSRepeat unit030304 developmental biology0303 health sciencesInformation retrievalProteinscomputer.file_formatProtein Data BankVisualizationSchema (genetic algorithms)030104 developmental biologyData qualityCorrigendumcomputerSoftware030217 neurology & neurosurgeryNucleic Acids Research
researchProduct

Disentangling the complexity of low complexity proteins

2020

Abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichot…

Protein ConformationComputer scienceReview ArticleComputational biologyMeasure (mathematics)Evolution MolecularLow complexity03 medical and health sciencesProtein DomainsAmino Acid Sequencestructure[SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry Molecular Biology/Biochemistry [q-bio.BM]Databases ProteinMolecular Biology030304 developmental biologyStructure (mathematical logic)0303 health sciencesSequence[SCCO.NEUR]Cognitive science/Neurosciencecomposition bias030302 biochemistry & molecular biologyProteinsdisorderlow complexity regionsStructure and function[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]AlgorithmsInformation SystemsBriefings in Bioinformatics
researchProduct