6533b837fe1ef96bd12a3386
RESEARCH PRODUCT
Repeatability in protein sequences
Pablo MierMohamed KamelAbdelkamel TariMiguel A. Andrade-navarrosubject
Repetitive Sequences Amino AcidGlobular proteinSaccharomyces cerevisiaeContext (language use)Computational biologyProtein–protein interactionEvolution Molecular03 medical and health sciencesSequence Analysis ProteinStructural BiologyHumansArabidopsis thalianaAmino Acid SequenceDatabases ProteinProtein secondary structure030304 developmental biologychemistry.chemical_classification0303 health sciencesbiology030302 biochemistry & molecular biologyProteinsbiology.organism_classificationAmino acidchemistrySequence AlignmentAlgorithmsFunction (biology)description
Low complexity regions (LCRs) in protein sequences have special properties that are very different from those of globular proteins. The rules that define secondary structure elements do not apply when the distribution of amino acids becomes biased. While there is a tendency towards structural disorder in LCRs, various examples, and particularly homorepeats of single amino acids, suggest that very short repeats could adopt structures very difficult to predict. These structures are possibly variable and dependant on the context of intra- or inter-molecular interactions. In general, short repeats in LCRs can induce structure. This could explain the observation that very short (non-perfect) repeats are widespread and many define regions with a function in protein interactions. For these reasons, we have developed an algorithm to quickly analyze local repeatability along protein sequences, that is, how close a protein fragment is from a perfect repeat. Using this algorithm we identified that the proteins of the yeast Saccharomyces cerevisiae are depleted in short repeats (approximate or not) of odd-length, while the human proteins are not, that the fish Danio rerio has many proteins with repeats of length two and that the plant Arabidopsis thaliana has an unusually large amount of repeats of length seven. Our method (REpeatability Scanner, RES, accessible at http://cbdm-01.zdv.uni-mainz.de/~munoz/res/) allows to find regions with approximate short repeats in protein sequences, and helps to characterize the variable use of LCRs and compositional bias in different organisms.
year | journal | country | edition | language |
---|---|---|---|---|
2019-04-03 | Journal of Structural Biology |