0000000000105880

AUTHOR

Pablo Mier

showing 33 related works from this author

Toward completion of the Earth’s proteome: an update a decade later

2017

Protein databases are steadily growing driven by the spread of new more efficient sequencing techniques. This growth is dominated by an increase in redundancy (homologous proteins with various degrees of sequence similarity) and by the incapability to process and curate sequence entries as fast as they are created. To understand these trends and aid bioinformatic resources that might be compromised by the increasing size of the protein sequence databases, we have created a less-redundant protein data set. In parallel, we analyzed the evolution of protein sequence databases in terms of size and redundancy. While the SwissProt database has decelerated its growth mostly because of a focus on i…

ProteomeOperations researchKnowledge Bases0206 medical engineering02 engineering and technologyComputational biologyBiology03 medical and health sciencesAnnotationProtein sequencingSequence Analysis ProteinThree-domain systemRedundancy (engineering)AnimalsHumansDatabases ProteinMolecular Biology030304 developmental biologySequence (medicine)0303 health sciencesComputational BiologyProteinsProtein superfamilyProteomeUniProtSoftware020602 bioinformaticsInformation SystemsBriefings in Bioinformatics
researchProduct

Avoided motifs: short amino acid strings missing from protein datasets.

2020

Abstract According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins…

0301 basic medicinechemistry.chemical_classificationProtein functionAmino Acid Motifs030102 biochemistry & molecular biologyClinical BiochemistryComputational BiologyProteinsContext (language use)Computational biologyBiologyBiochemistryAmino acid03 medical and health sciences030104 developmental biologySecretory proteinchemistryAmino acid compositionCytoplasmMolecular BiologyHuman proteinsSequence AlignmentBiological chemistryReferences
researchProduct

Traitpedia: a collaborative effort to gather species traits

2018

Abstract Summary Traitpedia is a collaborative database aimed to collect binary traits in a tabular form for a growing number of species. Availability and implementation Traitpedia can be accessed from http://cbdm-01.zdv.uni-mainz.de/~munoz/traitpedia. Supplementary information Supplementary data are available at Bioinformatics online.

Statistics and Probability0303 health sciencesInformation retrievalComputer science030302 biochemistry & molecular biologyDatabases and OntologiesMEDLINEBiochemistryPhenotypeApplications NotesComputer Science Applications03 medical and health sciencesComputational MathematicsPhenotypeComputational Theory and MathematicsMolecular BiologySoftware030304 developmental biologyGlobal biodiversityBioinformatics
researchProduct

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

2019

AbstractThe widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotatio…

FOS: Computer and information sciencesBioinformatics[SDV]Life Sciences [q-bio]Sequence assemblyGenomics[SDV.BC]Life Sciences [q-bio]/Cellular BiologyComputational biologyBiologyGenome03 medical and health sciencesAnnotation0302 clinical medicineTandem repeatGeneticsAnimalsSurvey and SummaryDatabases ProteinGeneComputingMilieux_MISCELLANEOUS030304 developmental biology0303 health sciencesEnd user572: BiochemieDNASequence Analysis DNAGenomics[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]WorkflowComputingMethodologies_PATTERNRECOGNITIONGadus morhuaTandem Repeat SequencesScientific Experimental Error[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Databases Nucleic Acid030217 neurology & neurosurgery
researchProduct

FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases.

2016

The accelerated growth of protein databases offers great possibilities for the study of protein function using sequence similarity and conservation. However, the huge number of sequences deposited in these databases requires new ways of analyzing and organizing the data. It is necessary to group the many very similar sequences, creating clusters with automated derived annotations useful to understand their function, evolution, and level of experimental evidence. We developed an algorithm called FastaHerder2, which can cluster any protein database, putting together very similar protein sequences based on near-full-length similarity and/or high threshold of sequence identity. We compressed 50…

0301 basic medicineProtein structure databaseProteomicsProteomeSequence analysisComputer sciencecomputer.software_genreSensitivity and SpecificitySet (abstract data type)Evolution Molecular03 medical and health sciences0302 clinical medicineSimilarity (network science)Sequence Analysis ProteinGeneticsCluster (physics)AnimalsCluster AnalysisHumansCluster analysisDatabases ProteinMolecular BiologySequenceDatabaseFunction (mathematics)Computational Mathematics030104 developmental biologyComputational Theory and MathematicsModeling and SimulationData miningcomputer030217 neurology & neurosurgerySoftwareJournal of computational biology : a journal of computational molecular cell biology
researchProduct

Between Interactions and Aggregates: The PolyQ Balance

2021

Abstract Polyglutamine regions (polyQ) are highly abundant consecutive runs of glutamine residues. They have been generally studied in relation to the so-called polyQ-associated diseases, characterized by protein aggregation caused by the expansion of the polyglutamine tract via a CAG-slippage mechanism. However, more than 4800 human proteins contain a polyQ, and only 9 of these regions are known to be associated with disease. Computational sequence studies and experimental structure determinations are completing a more interesting picture in which polyQ emerge as a motif for modulation of protein-protein interactions. But long polyQ regions may lead to an excess of interactions, and produc…

AcademicSubjects/SCI01140AcademicSubjects/SCI01130aggregationCAG-expansion diseasesContext (language use)Computational biologyReviewPolyglutamine tractBiologyProtein aggregationProtein–protein interactionhomorepeatprotein–protein interactionCodon usage biasGeneticsHumansPeptidesHuman proteinspolyglutamineEcology Evolution Behavior and SystematicsFunction (biology)Sequence (medicine)Genome Biology and Evolution
researchProduct

The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context

2020

Graphical abstract

lcsh:BiotechnologyGlutamineBiophysicsContext (language use)Computational biologyBiologyBiochemistrypolyQ03 medical and health sciences0302 clinical medicineStructural Biologylcsh:TP248.13-248.65GeneticsHuman proteome projectComputingMethodologies_COMPUTERGRAPHICS030304 developmental biologySequence (medicine)chemistry.chemical_classificationSequence context0303 health sciencesHomorepeatA proteinComputer Science ApplicationsAmino acidchemistry030220 oncology & carcinogenesisCodon usage biasProteomeCodon usageLength distributionResearch ArticleBiotechnologyComputational and Structural Biotechnology Journal
researchProduct

Disentangling the complexity of low complexity proteins

2020

Abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichot…

Protein ConformationComputer scienceReview ArticleComputational biologyMeasure (mathematics)Evolution MolecularLow complexity03 medical and health sciencesProtein DomainsAmino Acid Sequencestructure[SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry Molecular Biology/Biochemistry [q-bio.BM]Databases ProteinMolecular Biology030304 developmental biologyStructure (mathematical logic)0303 health sciencesSequence[SCCO.NEUR]Cognitive science/Neurosciencecomposition bias030302 biochemistry & molecular biologyProteinsdisorderlow complexity regionsStructure and function[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]AlgorithmsInformation SystemsBriefings in Bioinformatics
researchProduct

Flanking regions determine the structure of the poly-glutamine homo- repeat in huntingtin through mechanisms common among glutamine-rich human protei…

2020

International audience; The causative agent of Huntington's disease, the poly-Q homo-repeat in the N-terminal region of huntingtin (httex1), is flanked by a 17-residue-long fragment (N17) and a proline-rich region (PRR), which promote and inhibit the aggregation propensity of the protein, respectively, by poorly understood mechanisms. Based on experimental data obtained from site-specifically labeled NMR samples, we derived an ensemble model of httex1 that identified both flanking regions as opposing poly-Q secondary structure promoters. While N17 triggers helicity through a promiscuous hydrogen bond network involving the side chains of the first glutamines in the poly-Q tract, the PRR prom…

Repetitive Sequences Amino AcidHuntingtinAmino Acid Motifs[SDV.BBM.BP] Life Sciences [q-bio]/Biochemistry Molecular Biology/Biophysics03 medical and health sciencesHuntington's diseaseStructural BiologyHuman proteome projectmedicineHumans[SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry Molecular Biology/Biochemistry [q-bio.BM]Molecular BiologyHuman proteinsProtein secondary structure[SDV.BBM.BC] Life Sciences [q-bio]/Biochemistry Molecular Biology/Biochemistry [q-bio.BM]030304 developmental biology[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]Huntingtin Protein0303 health sciencesChemistry030302 biochemistry & molecular biologyPromotermedicine.diseaseCell biologyIntrinsically Disordered ProteinsGlutamine[SDV.BBM.BP]Life Sciences [q-bio]/Biochemistry Molecular Biology/BiophysicsPolyglutamic Acid[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Low Complexity Region
researchProduct

A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication.

2020

Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the t…

Protein familyStructural alignmentBiological data visualizationExonComputational biologyBiologyEvolution Molecular03 medical and health sciencesExonProtein structureTandem repeatStructural BiologyGene duplicationAnimalsHumans030304 developmental biology0303 health sciences030302 biochemistry & molecular biologyIntronProteinsExonsProtein superfamilyClassificationIntronsBiological data visualization; Classification; Exon; Protein evolution; Protein structure; Repeat proteinTandem Repeat SequencesRepeat proteinProtein structureProtein evolutionJournal of structural biology
researchProduct

The latent geometry of the human protein interaction network

2017

Abstract Motivation A series of recently introduced algorithms and models advocates for the existence of a hyperbolic geometry underlying the network representation of complex systems. Since the human protein interaction network (hPIN) has a complex architecture, we hypothesized that uncovering its latent geometry could ease challenging problems in systems biology, translating them into measuring distances between proteins. Results We embedded the hPIN to hyperbolic space and found that the inferred coordinates of nodes capture biologically relevant features, like protein age, function and cellular localization. This means that the representation of the hPIN in the two-dimensional hyperboli…

0301 basic medicineStatistics and ProbabilityGeometric analysisComputer scienceHyperbolic geometrySystems biologyComplex systemContext (language use)GeometryBiochemistryProtein–protein interaction03 medical and health sciencesInteraction networkHumansProtein Interaction MapsRepresentation (mathematics)Cluster analysisMolecular BiologySystems BiologyHyperbolic spaceProteinsFunction (mathematics)Original PapersComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsEmbeddingSignal transductionAlgorithmsSignal Transduction
researchProduct

The Role of Low Complexity Regions in Protein Interaction Modes: An Illustration in Huntingtin

2021

Low complexity regions (LCRs) are very frequent in protein sequences, generally having a lower propensity to form structured domains and tending to be much less evolutionarily conserved than globular domains. Their higher abundance in eukaryotes and in species with more cellular types agrees with a growing number of reports on their function in protein interactions regulated by post-translational modifications. LCRs facilitate the increase of regulatory and network complexity required with the emergence of organisms with more complex tissue distribution and development. Although the low conservation and structural flexibility of LCRs complicate their study, evolutionary studies of proteins …

Protein Conformation alpha-Helical0301 basic medicineNetwork complexityHuntingtinintrinsically disordered regionsAmino Acid MotifsComputational biologyBiologyprotein interactionsArticlecompositionally biased regionsCatalysisProtein–protein interactionlcsh:ChemistryEvolution MolecularInorganic ChemistryLow complexity03 medical and health sciencesProtein DomainsProtein Interaction MappingAnimalsHumansp300-CBP Transcription FactorsAmino Acid SequenceProtein Interaction MapsHuntingtinTissue distributionPhysical and Theoretical Chemistrylcsh:QH301-705.5Molecular BiologySpectroscopyHuntingtin Protein030102 biochemistry & molecular biologyOrganic ChemistryNuclear Proteinsp120 GTPase Activating ProteinGeneral MedicineMultiple modesSynapsinslow complexity regionsComputer Science ApplicationshomorepeatsMicroscopy Electron030104 developmental biologylcsh:Biology (General)lcsh:QD1-999Sequence AlignmentFunction (biology)Protein BindingInternational Journal of Molecular Sciences
researchProduct

CRISPR sequences are sometimes erroneously translated and can contaminate public databases with spurious proteins containing spaced repeats

2020

© The Author(s) 2020.

Computer scienceGene predictionGenomicscomputer.software_genreGeneral Biochemistry Genetics and Molecular BiologyHomology (biology)03 medical and health sciencesAnnotation0302 clinical medicineCRISPRClustered Regularly Interspaced Short Palindromic RepeatsDatabases Protein030304 developmental biology0303 health sciencesDatabasePalindromeProteinsComputational geneGenomicsAcademicSubjects/SCI00960Original ArticleUniProtGeneral Agricultural and Biological Sciencescomputer030217 neurology & neurosurgeryInformation Systems
researchProduct

AnABlast: Re-searching for Protein-Coding Sequences in Genomic Regions

2019

AnABlast is a computational tool that highlights protein-coding regions within intergenic and intronic DNA sequences which escape detection by standard gene prediction algorithms. DNA sequences with small protein-coding genes or exons, complex intron-containing genes, or degenerated DNA fragments are efficiently targeted by AnABlast. Furthermore, this algorithm is particularly useful in detecting protein-coding sequences with nonsignificant homologs to sequences in databases. AnABlast can be executed online at http://www.bioinfocabd.upo.es/anablast/ .

Fossil DNA sequencesProtein coding0303 health sciencesGene predictionCoding DNA sequences030302 biochemistry & molecular biologyComputational biologyBiologyGene findingDNA sequencing03 medical and health sciencesExonchemistry.chemical_compoundIntergenic regionchemistryHomologous chromosomeSmall genesGeneIn silico annotation toolDNA030304 developmental biology
researchProduct

The 18S ribosomal RNA m 6 A methyltransferase Mettl5 is required for normal walking behavior in Drosophila

2020

RNA modifications have recently emerged as an important layer of gene regulation. N6-methyladenosine (m6A) is the most prominent modification on eukaryotic messenger RNA and has also been found on noncoding RNA, including ribosomal and small nuclear RNA. Recently, several m6A methyltransferases were identified, uncovering the specificity of m6A deposition by structurally distinct enzymes. In order to discover additional m6A enzymes, we performed an RNAi screen to deplete annotated orthologs of human methyltransferase-like proteins (METTLs) in Drosophila cells and identified CG9666, the ortholog of human METTL5. We show that CG9666 is required for specific deposition of m6A on 18S ribosomal …

AdenosineBiochimiem 6 AMettl5WalkingBiologyBiochemistryRibosome18S ribosomal RNA03 medical and health sciences0302 clinical medicineGene expressionRNA Ribosomal 18SGeneticsAnimalsHumansRNA methyltransferase[SDV.BDD]Life Sciences [q-bio]/Development BiologyMolecular Biology030304 developmental biologyBehavior0303 health sciencesMessenger RNAbehaviorBiologie moléculaireRNA[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyMethyltransferasesm6ARibosomal RNANon-coding RNARibosome[SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry Molecular Biology/Biomolecules [q-bio.BM]3. Good healthCell biologyribosomeRNA RibosomalDrosophilaBiologie030217 neurology & neurosurgerySmall nuclear RNAReportsEMBO reports
researchProduct

Repeatability in protein sequences

2019

Low complexity regions (LCRs) in protein sequences have special properties that are very different from those of globular proteins. The rules that define secondary structure elements do not apply when the distribution of amino acids becomes biased. While there is a tendency towards structural disorder in LCRs, various examples, and particularly homorepeats of single amino acids, suggest that very short repeats could adopt structures very difficult to predict. These structures are possibly variable and dependant on the context of intra- or inter-molecular interactions. In general, short repeats in LCRs can induce structure. This could explain the observation that very short (non-perfect) rep…

Repetitive Sequences Amino AcidGlobular proteinSaccharomyces cerevisiaeContext (language use)Computational biologyProtein–protein interactionEvolution Molecular03 medical and health sciencesSequence Analysis ProteinStructural BiologyHumansArabidopsis thalianaAmino Acid SequenceDatabases ProteinProtein secondary structure030304 developmental biologychemistry.chemical_classification0303 health sciencesbiology030302 biochemistry & molecular biologyProteinsbiology.organism_classificationAmino acidchemistrySequence AlignmentAlgorithmsFunction (biology)Journal of Structural Biology
researchProduct

The Conservation of Low Complexity Regions in Bacterial Proteins Depends on the Pathogenicity of the Strain and Subcellular Location of the Protein

2021

Low complexity regions (LCRs) in proteins are characterized by amino acid frequencies that differ from the average. These regions evolve faster and tend to be less conserved between homologs than globular domains. They are not common in bacteria, as compared to their prevalence in eukaryotes. Studying their conservation could help provide hypotheses about their function. To obtain the appropriate evolutionary focus for this rapidly evolving feature, here we study the conservation of LCRs in bacterial strains and compare their high variability to the closeness of the strains. For this, we selected 20 taxonomically diverse bacterial species and obtained the completely sequenced proteomes of t…

Proteomics0301 basic medicinelcsh:QH426-470030106 microbiologyBiologyArticlecompositionally biased regionsEvolution MolecularLow complexity03 medical and health sciencesBacterial ProteinsSequence Analysis ProteinGeneticsExtracellularGenetics (clinical)chemistry.chemical_classificationBacteriaVirulenceStrain (chemistry)Computational Biologybiology.organism_classificationlow complexity regionsAmino acidhomorepeatslcsh:Genetics030104 developmental biologychemistryEvolutionary biologybacterial strainsProteomeorthologyBacterial outer membraneBacteriaFunction (biology)host–pathogen interactionsGenes
researchProduct

orthoFind Facilitates the Discovery of Homologous and Orthologous Proteins

2015

Finding homologous and orthologous protein sequences is often the first step in evolutionary studies, annotation projects, and experiments of functional complementation. Despite all currently available computational tools, there is a requirement for easy-to-use tools that provide functional information. Here, a new web application called orthoFind is presented, which allows a quick search for homologous and orthologous proteins given one or more query sequences, allowing a recurrent and exhaustive search against reference proteomes, and being able to include user databases. It addresses the protein multidomain problem, searching for homologs with the same domain architecture, and gives a si…

Architecture domainScienceBrute-force searchSequence alignmentComputational biologyBiologyAnnotationDatabases GeneticHomologous chromosomeAnimalsHumansWeb applicationAmino Acid SequenceGeneticsInternetMultidisciplinarySequence Homology Amino Acidbusiness.industryQRProteinsSequence homologyProteomeMedicinebusinessSequence AlignmentSoftwareResearch ArticlePLOS ONE
researchProduct

dAPE: a web server to detect homorepeats and follow their evolution.

2016

Abstract Summary Homorepeats are low complexity regions consisting of repetitions of a single amino acid residue. There is no current consensus on the minimum number of residues needed to define a functional homorepeat, nor even if mismatches are allowed. Here we present dAPE, a web server that helps following the evolution of homorepeats based on orthology information, using a sensitive but tunable cutoff to help in the identification of emerging homorepeats. Availability and Implementation dAPE can be accessed from http://cbdm-01.zdv.uni-mainz.de/∼munoz/polyx. Supplementary information Supplementary data are available at Bioinformatics online.

0301 basic medicineStatistics and ProbabilityRepetitive Sequences Amino AcidWeb serverInternetComputer sciencecomputer.software_genreBiochemistryApplications NotesComputer Science ApplicationsWorld Wide WebEvolution Molecular03 medical and health sciencesComputational Mathematics030104 developmental biologyComputational Theory and MathematicsAnimalsHumansData miningMolecular BiologycomputerSequence AlignmentSequence AnalysisSoftwareBioinformatics (Oxford, England)
researchProduct

Evolutionary Study of Disorder in Protein Sequences

2020

Intrinsically disordered proteins (IDPs) contain regions lacking intrinsic globular structure (intrinsically disordered regions, IDRs). IDPs are present across the tree of life, with great variability of IDR type and frequency even between closely related taxa. To investigate the function of IDRs, we evaluated and compared the distribution of disorder content in 10,695 reference proteomes, confirming its high variability and finding certain correlation along the Euteleostomi (bony vertebrates) lineage to number of cell types. We used the comparison of orthologs to study the function of disorder related to increase in cell types, observing that multiple interacting subunits of protein comple…

intrinsically disordered regionsortholog comparisonLineage (evolution)High variabilitylcsh:QR1-502comparative genomicsBiologyIntrinsically disordered proteinsBiochemistryArticlelcsh:MicrobiologyEvolution Molecular03 medical and health sciencesSequence Analysis ProteinAnimalsDatabases ProteinMolecular Biology030304 developmental biologyComparative genomics0303 health sciences030302 biochemistry & molecular biologyEvolutionary biologyVertebratesProteomeintrinsically disordered proteinsFunction (biology)Biomolecules
researchProduct

Protein-protein interactions can be predicted using coiled coil co-evolution patterns

2016

AbstractProtein-protein interactions are sometimes mediated by coiled coil structures. The evolutionary conservation of interacting orthologs in different species, along with the presence or absence of coiled coils in them, may help in the prediction of interacting pairs. Here, we illustrate how the presence of coiled coils in a protein can be exploited as a potential indicator for its interaction with another protein with coiled coils. The prediction capability of our strategy improves when restricting our dataset to highly reliable, known protein-protein interactions. Our study of the co-evolution of coiled coils demonstrates that pairs of interacting proteins can be distinguished from no…

0301 basic medicineStatistics and ProbabilityComputational biologyCorrelated evolutionGeneral Biochemistry Genetics and Molecular BiologyProtein Structure SecondaryProtein–protein interactionConserved sequenceEvolution Molecular03 medical and health sciencesProtein-protein interactionModelling and SimulationImmunology and Microbiology(all)Coiled coilGeneticsCoiled coilPhysicsMedicine(all)030102 biochemistry & molecular biologyGeneral Immunology and MicrobiologyAgricultural and Biological Sciences(all)Models GeneticBiochemistry Genetics and Molecular Biology(all)Applied MathematicsA proteinProteinsGeneral Medicine030104 developmental biologyModeling and SimulationGeneral Agricultural and Biological SciencesJournal of Theoretical Biology
researchProduct

Assessing the low complexity of protein sequences via the low complexity triangle.

2020

Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins…

ProteomeProteomesComputer scienceProtein SequencingBiochemistryDatabase and Informatics MethodsSequence Analysis ProteinProtein methodsPeptide sequencechemistry.chemical_classification0303 health sciencesSequenceMultidisciplinary030302 biochemistry & molecular biologyQRGenomicsAmino acidTandem RepeatsProteomeAmino Acid AnalysisMedicineSequence AnalysisResearch ArticleRepetitive Sequences Amino AcidBioinformaticsSequence analysisScienceResearch and Analysis MethodsGenome Complexity03 medical and health sciencesProtein DomainsAmino Acid Sequence AnalysisTandem repeatGeneticsHumansFraction (mathematics)Repeated SequencesAmino Acid SequenceMolecular Biology TechniquesSequencing TechniquesRepresentation (mathematics)Molecular Biology030304 developmental biologyMolecular Biology Assays and Analysis Techniquesbusiness.industryBiology and Life SciencesProteinsComputational BiologyPattern recognitionchemistryGlobular ProteinsArtificial intelligencebusinessPLoS ONE
researchProduct

PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins

2020

Abstract Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToL…

Sequence analysisAcademicSubjects/SCI00010Protein domainComputational biologyBiologyDomain (software engineering)Computer graphics03 medical and health sciencesAnnotationProtein DomainsSequence Analysis ProteinGeneticsComputer GraphicsHumansAmino Acids030304 developmental biology0303 health sciencesIntersection (set theory)030302 biochemistry & molecular biologyMembrane ProteinsProteinsMolecular Sequence AnnotationVisualizationMolecular Sequence AnnotationWeb Server IssueSoftwareNucleic Acids Research
researchProduct

MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments

2020

Multiple sequence alignments are usually phylogenetically driven. They are studied in the framework of evolution. But sometimes, it is interesting to study residue conservation at positions unconstrained by evolutionary rules. We present a supervised method to access a layer of information difficult to appreciate visually when many protein sequences are aligned. This new tool (MAGA; http://cbdm-01.zdv.uni-mainz.de/~munoz/maga/ ) locates positions in multiple sequence alignments differentially conserved in manually defined groups of sequences.

0303 health sciencesmultiple sequence alignmentsSequence analysisComputer science0206 medical engineeringMethods and ProtocolsSequence analysislcsh:Evolution02 engineering and technologyComputational biologyComputer Science Applications03 medical and health sciencesmotif findingcomputational biologyweb servicesGeneticslcsh:QH359-425020602 bioinformaticsEcology Evolution Behavior and Systematics030304 developmental biologyEvolutionary Bioinformatics
researchProduct

Automated selection of homologs to track the evolutionary history of proteins

2018

Background The selection of distant homologs of a query protein under study is a usual and useful application of protein sequence databases. Such sets of homologs are often applied to investigate the function of a protein and the degree to which experimental results can be transferred from one organism to another. In particular, a variety of databases facilitates static browsing for orthologs. However, these resources have a limited power when identifying orthologs between taxonomically distant species. In addition, in some situations, for a given query protein, it is advantageous to compare the sets of orthologs from different specific organisms: this recursive step-wise search might give …

0301 basic medicineProteomeComputer scienceComputational biologyWeb toollcsh:Computer applications to medicine. Medical informaticsBiochemistryHomology (biology)Evolution Molecular03 medical and health sciences0302 clinical medicineProtein sequencingStructural BiologyHomologous chromosomeHumansDatabases ProteinMolecular Biologylcsh:QH301-705.5OrganismProtein functionMethodology ArticleApplied MathematicsProteinsA proteinComputer Science ApplicationsHomologyEvolutionary path030104 developmental biologyComputingMethodologies_PATTERNRECOGNITIONlcsh:Biology (General)Proteomelcsh:R858-859.7DNA microarraySoftware030217 neurology & neurosurgeryBMC Bioinformatics
researchProduct

REP2: A Web Server to Detect Common Tandem Repeats in Protein Sequences

2020

Ensembles of tandem repeats (TRs) in protein sequences expand rapidly to form domains well suited for interactions with proteins. For this reason, they are relatively frequent. Some TRs have known structures and therefore it is advantageous to predict their presence in a protein sequence. However, since most TRs diverge quickly, their detection by classical sequence comparison algorithms is not very accurate. Previously, we developed a method and a web server that used curated profiles and thresholds for the detection of 11 common TRs. Here we present a new web server (REP2) that allows the analysis of TRs in both individual and aligned sequences. We provide currently precomputed analyses f…

Repetitive Sequences Amino AcidWeb serverProteomeComputer scienceComputational biologycomputer.software_genreEvolution Molecular03 medical and health sciences0302 clinical medicineTandem repeatStructural BiologySequence comparisonHumansAmino Acid SequenceMolecular BiologyConserved Sequence030304 developmental biologySequence (medicine)Comparative genomicsInternet0303 health sciencesMultiple sequence alignmentBacteriaProteinsTandem Repeat SequencesProteomeUniProtSequence Alignmentcomputer030217 neurology & neurosurgeryJournal of Molecular Biology
researchProduct

Proteome-wide comparison between the amino acid composition of domains and linkers

2018

Objective Amino acid composition is a sequence feature that has been extensively used to characterize proteomes of many species and protein families. Yet the analysis of amino acid composition of protein domains and the linkers connecting them has received less attention. Here, we perform both a comprehensive full-proteome amino acid composition analysis and a similar analysis focusing on domains and linkers, to uncover domain- or linker-specific differential amino acid usage patterns. Results The amino acid composition in the 38 proteomes studied showcase the greater variability found in archaea and bacteria species compared to eukaryotes. When focusing on domains and linkers, we describe …

Proteomics570BacteriaProteomeAmino acid compositionlcsh:Rlcsh:MedicineEukaryotaArchaea570 Life sciencesResearch Notelcsh:Biology (General)Sequence Analysis ProteinCatalytic DomainDomainsAmino Acid SequenceLinkerslcsh:Science (General)lcsh:QH301-705.5570 Biowissenschaftenlcsh:Q1-390BMC Research Notes
researchProduct

SuppFile1.fasta.txt – Supplemental material for MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments

2020

Supplemental material, SuppFile1.fasta.txt for MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments by Pablo Mier and Miguel A Andrade-Navarro in Evolutionary Bioinformatics

Cell Biology
researchProduct

Glutamine Codon Usage and polyQ Evolution in Primates Depend on the Q Stretch Length

2018

Abstract Amino acid usage in a proteome depends mostly on its taxonomy, as it does the codon usage in transcriptomes. Here, we explore the level of variation in the codon usage of a specific amino acid, glutamine, in relation to the number of consecutive glutamine residues. We show that CAG triplets are consistently more abundant in short glutamine homorepeats (polyQ, four to eight residues) than in shorter glutamine stretches (one to three residues), leading to the evolutionary growth of the repeat region in a CAG-dependent manner. The length of orthologous polyQ regions is mostly stable in primates, particularly the short ones. Interestingly, given a short polyQ the CAG usage is higher in…

Primatescongenital hereditary and neonatal diseases and abnormalitiescodon usageProteomeGlutaminehomorepeatEvolution MolecularAnimalsHumansglutamine stretchCodonPeptidespolyQ-associated diseasesResearch ArticleGenome Biology and Evolution
researchProduct

SuppFile2.fasta.txt – Supplemental material for MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments

2020

Supplemental material, SuppFile2.fasta.txt for MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments by Pablo Mier and Miguel A Andrade-Navarro in Evolutionary Bioinformatics

Cell Biology
researchProduct

Additional file 2: of Automated selection of homologs to track the evolutionary history of proteins

2018

Figure S1. Number of orthology pairwise relationships calculated with OrthoMCL, ProteinPathTracker and Reciprocal Best Hit Blast (RBHB) in 15 species, using the proteomes provided by OrthoMCL in the default species from the default path in ProteinPathTracker, and taking E. coli proteins as reference. a) All OrthoMCL pairs. b) Only the best 25% scored OrthoMCL pairs. (PNG 388Â kb)

researchProduct

Additional file 1: of Automated selection of homologs to track the evolutionary history of proteins

2018

List of complete reference proteomes used in the web tool, organised by evolutionary path. (XLSX 13Â kb)

GeneralLiterature_REFERENCE(e.g.dictionariesencyclopediasglossaries)
researchProduct

MOESM1 of Proteome-wide comparison between the amino acid composition of domains and linkers

2018

Additional file 1. List of proteomes used for the analyses. Each proteome is described by the name of the species, abbreviation as used in the manuscript, UniProt organism ID, number of proteins, and percentage of amino acids from domains/linkers against the total amino acid composition of the proteome.

researchProduct