0000000000105879

AUTHOR

Miguel A. Andrade-navarro

showing 84 related works from this author

Evolutionary stability of topologically associating domains is associated with conserved gene regulation

2018

AbstractBackgroundThe human genome is highly organized in the three-dimensional nucleus. Chromosomes fold locally into topologically associating domains (TADs) defined by increased intra-domain chromatin contacts. TADs contribute to gene regulation by restricting chromatin interactions of regulatory sequences, such as enhancers, with their target genes. Disruption of TADs can result in altered gene expression and is associated to genetic diseases and cancers. However, it is not clear to which extent TAD regions are conserved in evolution and whether disruption of TADs by evolutionary rearrangements can alter gene expression.ResultsHere, we hypothesize that TADs represent essential functiona…

0301 basic medicinePhysiologyEvolutionGenome rearrangementsGene ExpressionGenomicsPlant ScienceComputational biologyBiologyGenomeGeneral Biochemistry Genetics and Molecular BiologyEvolution Molecular03 medical and health sciencesMiceStructural BiologyHi-CGene expressionAnimalsHumansEnhancerlcsh:QH301-705.5GeneSelectionEcology Evolution Behavior and SystematicsRegulation of gene expressionGenomeTopologically associating domainsGenome HumanCell BiologyTADChromatin Assembly and DisassemblyChromatinGene regulation030104 developmental biologylcsh:Biology (General)Gene Expression RegulationRegulatory sequenceHuman genomeGeneral Agricultural and Biological SciencesStructural variantsChromatin interactions3D genome architectureDevelopmental BiologyBiotechnologyResearch ArticleBMC Biology
researchProduct

Evaluating Cell Identity from Transcription Profiles

2018

SummaryInduced pluripotent stem cells (iPS) and direct lineage programming offer promising autologous and patient-specific sources of cells for personalized drug-testing and cell-based therapy. Before these engineered cells can be widely used, it is important to evaluate how well the engineered cell types resemble their intended target cell types. We have developed a method to generate CellScore, a cell identity score that can be used to evaluate the success of an engineered cell type in relation to both its initial and desired target cell type, which are used as references. Of 20 cell transitions tested, the most successful transitions were the iPS cells (CellScore > 0.9), while other t…

0303 health sciences03 medical and health sciencesCell typemedicine.anatomical_structureTranscription (biology)030302 biochemistry & molecular biologyCellmedicineBiologyInduced pluripotent stem cellCell identity030304 developmental biologyCell biology
researchProduct

Toward completion of the Earth’s proteome: an update a decade later

2017

Protein databases are steadily growing driven by the spread of new more efficient sequencing techniques. This growth is dominated by an increase in redundancy (homologous proteins with various degrees of sequence similarity) and by the incapability to process and curate sequence entries as fast as they are created. To understand these trends and aid bioinformatic resources that might be compromised by the increasing size of the protein sequence databases, we have created a less-redundant protein data set. In parallel, we analyzed the evolution of protein sequence databases in terms of size and redundancy. While the SwissProt database has decelerated its growth mostly because of a focus on i…

ProteomeOperations researchKnowledge Bases0206 medical engineering02 engineering and technologyComputational biologyBiology03 medical and health sciencesAnnotationProtein sequencingSequence Analysis ProteinThree-domain systemRedundancy (engineering)AnimalsHumansDatabases ProteinMolecular Biology030304 developmental biologySequence (medicine)0303 health sciencesComputational BiologyProteinsProtein superfamilyProteomeUniProtSoftware020602 bioinformaticsInformation SystemsBriefings in Bioinformatics
researchProduct

Avoided motifs: short amino acid strings missing from protein datasets.

2020

Abstract According to the amino acid composition of natural proteins, it could be expected that all possible sequences of three or four amino acids will occur at least once in large protein datasets purely by chance. However, in some species or cellular context, specific short amino acid motifs are missing due to unknown reasons. We describe these as Avoided Motifs, short amino acid combinations missing from biological sequences. Here we identify 209 human and 154 bacterial Avoided Motifs of length four amino acids, and discuss their possible functionality according to their presence in other species. Furthermore, we determine two Avoided Motifs of length three amino acids in human proteins…

0301 basic medicinechemistry.chemical_classificationProtein functionAmino Acid Motifs030102 biochemistry & molecular biologyClinical BiochemistryComputational BiologyProteinsContext (language use)Computational biologyBiologyBiochemistryAmino acid03 medical and health sciences030104 developmental biologySecretory proteinchemistryAmino acid compositionCytoplasmMolecular BiologyHuman proteinsSequence AlignmentBiological chemistryReferences
researchProduct

RepeatsDB in 2021: improved data and extended classification for protein tandem repeat structures

2020

The RepeatsDB database (URL: https://repeatsdb.org/) provides annotations and classification for protein tandem repeat structures from the Protein Data Bank (PDB). Protein tandem repeats are ubiquitous in all branches of the tree of life. The accumulation of solved repeat structures provides new possibilities for classification and detection, but also increasing the need for annotation. Here we present RepeatsDB 3.0, which addresses these challenges and presents an extended classification scheme. The major conceptual change compared to the previous version is the hierarchical classification combining top levels based solely on structural similarity (Class > Topology > Fold) with two new lev…

Repetitive Sequences Amino AcidAcademicSubjects/SCI00010BiologíaStatistics as TopicProtein Data Bank (RCSB PDB)Computational biologyBiologyRepetitive SequencesGene Ontology; HEK293 Cells; HeLa Cells; Humans; Proteins; Reproducibility of Results; Statistics as Topic; User-Computer Interface; Databases Protein; Repetitive Sequences Amino Acid; Tandem Repeat SequencesDatabases03 medical and health sciencesAnnotationUser-Computer InterfaceProtein structureSimilarity (network science)Tandem repeatGeneticsDatabase IssueHumansDatabases ProteinCiencias Exactasdatabase030304 developmental biology0303 health sciencesHierarchy (mathematics)Protein030302 biochemistry & molecular biologyProteinsReproducibility of Resultscomputer.file_formatProtein Data BankClass (biology)proteinsAmino AcidComputingMethodologies_PATTERNRECOGNITIONGene OntologyHEK293 CellsclassificationTandem Repeat Sequencesprotein tandem repeat structures[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]computerHeLa CellsNucleic Acids Research
researchProduct

Towards identifying drug side effects from social media using active learning and crowd sourcing.

2019

Motivation Social media is a largely untapped source of information on side effects of drugs. Twitter in particular is widely used to report on everyday events and personal ailments. However, labeling this noisy data is a difficult problem because labeled training data is sparse and automatic labeling is error-prone. Crowd sourcing can help in such a scenario to obtain more reliable labels, but is expensive in comparison because workers have to be paid. To remedy this, semi-supervised active learning may reduce the number of labeled data needed and focus the manual labeling process on important information. Results We extracted data from Twitter using the public API. We subsequently use Ama…

0303 health sciencesFocus (computing)Information retrievalDrug-Related Side Effects and Adverse ReactionsProcess (engineering)business.industryActive learning (machine learning)Computer scienceComputational BiologyCrowdsourcing03 medical and health sciences0302 clinical medicineProblem-based learningCode (cryptography)CrowdsourcingHumansSocial media030212 general & internal medicinebusinessBaseline (configuration management)Social Media030304 developmental biologyPacific Symposium on Biocomputing. Pacific Symposium on Biocomputing
researchProduct

Traitpedia: a collaborative effort to gather species traits

2018

Abstract Summary Traitpedia is a collaborative database aimed to collect binary traits in a tabular form for a growing number of species. Availability and implementation Traitpedia can be accessed from http://cbdm-01.zdv.uni-mainz.de/~munoz/traitpedia. Supplementary information Supplementary data are available at Bioinformatics online.

Statistics and Probability0303 health sciencesInformation retrievalComputer science030302 biochemistry & molecular biologyDatabases and OntologiesMEDLINEBiochemistryPhenotypeApplications NotesComputer Science Applications03 medical and health sciencesComputational MathematicsPhenotypeComputational Theory and MathematicsMolecular BiologySoftware030304 developmental biologyGlobal biodiversityBioinformatics
researchProduct

Computational identification of cell-specific variable regions in ChIP-seq data.

2019

ABSTRACT Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is used to identify genome-wide DNA regions bound by proteins. Several sources of variation can affect the reproducibility of a particular ChIP-seq assay, which can lead to a misinterpretation of where the protein under investigation binds to the genome in a particular cell type. Given one ChIP-seq experiment with replicates, binding sites not observed in all the replicates will usually be interpreted as noise and discarded. However, the recent discovery of high-occupancy target (HOT) regions suggests that there are regions where binding of multiple transcription factors can be identified. To investigate these regions,…

Cell typeAcademicSubjects/SCI00010Computational biologyPlasma protein bindingBiologyGenomeCell LineEvolution Molecular03 medical and health scienceschemistry.chemical_compoundMice0302 clinical medicineNarese/3Cell Line TumorGeneticsAnimalsHumansEpigeneticsBinding sitePromoter Regions GeneticTranscription factorEmbryonic Stem Cells030304 developmental biology0303 health sciencesPrincipal Component AnalysisBinding SitesNucleotidesGenetic VariationPromoterGenomicsChromatinchemistryCpG siteMCF-7 CellsChromatin Immunoprecipitation SequencingMethods OnlineR-Loop StructuresK562 CellsChromatin immunoprecipitation030217 neurology & neurosurgeryFunction (biology)DNATranscription FactorsNucleic acids research
researchProduct

RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures

2017

RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by a…

0301 basic medicineRepetitive Sequences Amino Acid[SDV.BC]Life Sciences [q-bio]/Cellular BiologyBiologyBioinformaticsSearch engineAnnotationStructure-Activity Relationship03 medical and health sciences0302 clinical medicineTandem repeatGeneticsAnimalsHumansDatabase IssueDatabases ProteinComputingMilieux_MISCELLANEOUSRepeat unit030304 developmental biology0303 health sciencesInformation retrievalProteinscomputer.file_formatProtein Data BankVisualizationSchema (genetic algorithms)030104 developmental biologyData qualityCorrigendumcomputerSoftware030217 neurology & neurosurgeryNucleic Acids Research
researchProduct

FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases.

2016

The accelerated growth of protein databases offers great possibilities for the study of protein function using sequence similarity and conservation. However, the huge number of sequences deposited in these databases requires new ways of analyzing and organizing the data. It is necessary to group the many very similar sequences, creating clusters with automated derived annotations useful to understand their function, evolution, and level of experimental evidence. We developed an algorithm called FastaHerder2, which can cluster any protein database, putting together very similar protein sequences based on near-full-length similarity and/or high threshold of sequence identity. We compressed 50…

0301 basic medicineProtein structure databaseProteomicsProteomeSequence analysisComputer sciencecomputer.software_genreSensitivity and SpecificitySet (abstract data type)Evolution Molecular03 medical and health sciences0302 clinical medicineSimilarity (network science)Sequence Analysis ProteinGeneticsCluster (physics)AnimalsCluster AnalysisHumansCluster analysisDatabases ProteinMolecular BiologySequenceDatabaseFunction (mathematics)Computational Mathematics030104 developmental biologyComputational Theory and MathematicsModeling and SimulationData miningcomputer030217 neurology & neurosurgerySoftwareJournal of computational biology : a journal of computational molecular cell biology
researchProduct

Between Interactions and Aggregates: The PolyQ Balance

2021

Abstract Polyglutamine regions (polyQ) are highly abundant consecutive runs of glutamine residues. They have been generally studied in relation to the so-called polyQ-associated diseases, characterized by protein aggregation caused by the expansion of the polyglutamine tract via a CAG-slippage mechanism. However, more than 4800 human proteins contain a polyQ, and only 9 of these regions are known to be associated with disease. Computational sequence studies and experimental structure determinations are completing a more interesting picture in which polyQ emerge as a motif for modulation of protein-protein interactions. But long polyQ regions may lead to an excess of interactions, and produc…

AcademicSubjects/SCI01140AcademicSubjects/SCI01130aggregationCAG-expansion diseasesContext (language use)Computational biologyReviewPolyglutamine tractBiologyProtein aggregationProtein–protein interactionhomorepeatprotein–protein interactionCodon usage biasGeneticsHumansPeptidesHuman proteinspolyglutamineEcology Evolution Behavior and SystematicsFunction (biology)Sequence (medicine)Genome Biology and Evolution
researchProduct

Interpretable machine learning models for single-cell ChIP-seq imputation

2019

AbstractMotivationSingle-cell ChIP-seq (scChIP-seq) analysis is challenging due to data sparsity. High degree of data sparsity in biological high-throughput single-cell data is generally handled with imputation methods that complete the data, but specific methods for scChIP-seq are lacking. We present SIMPA, a scChIP-seq data imputation method leveraging predictive information within bulk data from ENCODE to impute missing protein-DNA interacting regions of target histone marks or transcription factors.ResultsImputations using machine learning models trained for each single cell, each target, and each genomic region accurately preserve cell type clustering and improve pathway-related gene i…

Computer sciencebusiness.industryCell chipPython (programming language)Machine learningcomputer.software_genreENCODEIdentification (information)Simulated dataFeature (machine learning)Imputation (statistics)Artificial intelligenceCluster analysisbusinesscomputercomputer.programming_language
researchProduct

The Developmental Transcriptome for Lytechinus variegatus Exhibits Temporally Punctuated Gene Expression Changes

2019

AbstractEmbryonic development is arguably the most complex process an organism undergoes during its lifetime, and understanding this complexity is best approached with a systems-level perspective. The sea urchin has become a highly valuable model organism for understanding developmental specification, morphogenesis, and evolution. As a non-chordate deuterostome, the sea urchin occupies an important evolutionary niche between protostomes and vertebrates.Lytechinus variegatus(Lv) is an Atlantic species that has been well studied, and which has provided important insights into signal transduction, patterning, and morphogenetic changes during embryonic and larval development. The Pacific specie…

ved/biology.organism_classification_rank.speciesGene regulatory networkMorphogenesisRNA-SeqTranscriptome03 medical and health sciences0302 clinical medicineLytechinusbiology.animalAnimalsGene Regulatory NetworksModel organismStrongylocentrotus purpuratusMolecular BiologySea urchin030304 developmental biologyLytechinus variegatus0303 health sciencesDeuterostomebiologyved/biologyurogenital systemGene Expression Regulation DevelopmentalCell Biologybiology.organism_classificationStrongylocentrotus purpuratusEvolutionary biologyembryonic structuresTranscriptome030217 neurology & neurosurgeryDevelopmental Biology
researchProduct

The importance of definitions in the study of polyQ regions: A tale of thresholds, impurities and sequence context

2020

Graphical abstract

lcsh:BiotechnologyGlutamineBiophysicsContext (language use)Computational biologyBiologyBiochemistrypolyQ03 medical and health sciences0302 clinical medicineStructural Biologylcsh:TP248.13-248.65GeneticsHuman proteome projectComputingMethodologies_COMPUTERGRAPHICS030304 developmental biologySequence (medicine)chemistry.chemical_classificationSequence context0303 health sciencesHomorepeatA proteinComputer Science ApplicationsAmino acidchemistry030220 oncology & carcinogenesisCodon usage biasProteomeCodon usageLength distributionResearch ArticleBiotechnologyComputational and Structural Biotechnology Journal
researchProduct

MIPPIE: the mouse integrated protein–protein interaction reference

2020

Abstract Cells operate and react to environmental signals thanks to a complex network of protein–protein interactions (PPIs), the malfunction of which can severely disrupt cellular homeostasis. As a result, mapping and analyzing protein networks are key to advancing our understanding of biological processes and diseases. An invaluable part of these endeavors has been the house mouse (Mus musculus), the mammalian model organism par excellence, which has provided insights into human biology and disorders. The importance of investigating PPI networks in the context of mouse prompted us to develop the Mouse Integrated Protein–Protein Interaction rEference (MIPPIE). MIPPIE inherits a robust infr…

Computer scienceved/biology.organism_classification_rank.speciesprotein-protein interactionsCellular homeostasisContext (language use)Computational biologycomputer.software_genreGeneral Biochemistry Genetics and Molecular BiologyProtein–protein interaction03 medical and health sciencesMice0302 clinical medicineProtein Interaction MappingMus musculusAnimalsProtein Interaction MapsModel organismDatabases Proteinmousedatabase030304 developmental biology0303 health sciencesved/biologyComputational BiologyComplex networkprotein interaction networkOriginal ArticleWeb serviceUser interfaceGeneral Agricultural and Biological SciencesProtein networkcomputer030217 neurology & neurosurgerySoftwareInformation SystemsDatabase: The Journal of Biological Databases and Curation
researchProduct

Function and Evolution of Nematode RNAi Pathways

2019

Selfish genetic elements, like transposable elements or viruses, are a threat to genomic stability. A variety of processes, including small RNA-based RNA interference (RNAi)-like pathways, has evolved to counteract these elements. Amongst these, endogenous small interfering RNA and Piwi-interacting RNA (piRNA) pathways were implicated in silencing selfish genetic elements in a variety of organisms. Nematodes have several incredibly specialized, rapidly evolving endogenous RNAi-like pathways serving such purposes. Here, we review recent research regarding the RNAi-like pathways of Caenorhabditis elegans as well as those of other nematodes, to provide an evolutionary perspective. We argue tha…

0301 basic medicineSmall RNASmall interfering RNAPiwilcsh:QH426-470nematodePiwi-interacting RNAReviewComputational biologypiRNABiochemistry03 medical and health sciences0302 clinical medicineRNA interference21U RNAGenetics22G RNAGene silencing26G RNAsmall RNAMolecular BiologyCaenorhabditis elegansRdRPbiologyRNAArgonautebiology.organism_classificationArgonautelcsh:Genetics030104 developmental biologysiRNAC. elegans030217 neurology & neurosurgeryNon-Coding RNA
researchProduct

Liver-Kidney-on-Chip To Study Toxicity of Drug Metabolites

2017

Advances in organ-on-chip technologies for the application in in vitro drug development provide an attractive alternative approach to replace ethically controversial animal testing and to establish a basis for accelerated drug development. In recent years, various chip-based tissue culture systems have been developed, which are mostly optimized for cultivation of one single cell type or organoid structure and lack the representation of multi organ interactions. Here we present an optimized microfluidic chip design consisting of interconnected compartments, which provides the possibility to mimic the exchange between different organ specific cell types and enables to study interdependent cel…

0301 basic medicineKidneyCell typeBiomedical Engineering02 engineering and technologyComputational biologyBiology021001 nanoscience & nanotechnologyBiomaterials03 medical and health sciencesTissue culture030104 developmental biologymedicine.anatomical_structureDrug developmentToxicityHepatic stellate cellOrganoidmedicine0210 nano-technologyDrug metabolismBiomedical engineeringACS Biomaterials Science & Engineering
researchProduct

Bioinformatics in theory and application - highlights of the 36th German Conference on Bioinformatics.

2021

GermanEngineeringbusiness.industryClinical BiochemistrylanguageComputational BiologybusinessMolecular BiologyBiochemistryData sciencelanguage.human_languageBiological chemistryReferences
researchProduct

The Anti-amyloid Compound DO1 Decreases Plaque Pathology and Neuroinflammation-Related Expression Changes in 5xFAD Transgenic Mice

2018

Self-propagating amyloid-β (Aβ) aggregates or seeds possibly drive pathogenesis of Alzheimer's disease (AD). Small molecules targeting such structures might act therapeutically in vivo. Here, a fluorescence polarization assay was established that enables the detection of compound effects on both seeded and spontaneous Aβ42 aggregation. In a focused screen of anti-amyloid compounds, we identified Disperse Orange 1 (DO1) ([4-((4-nitrophenyl)diazenyl)-N-phenylaniline]), a small molecule that potently delays both seeded and non-seeded Aβ42 polymerization at substoichiometric concentrations. Mechanistic studies revealed that DO1 disrupts preformed fibrillar assemblies of synthetic Aβ42 peptides …

MaleGenetically modified mouse1303 BiochemistryAmyloid10017 Institute of AnatomyClinical BiochemistryMice TransgenicPlaque Amyloid610 Medicine & healthBiologyProtein aggregation1308 Clinical Biochemistry01 natural sciencesBiochemistryPolymerizationPathogenesisMiceProtein AggregatesStructure-Activity RelationshipAlzheimer DiseaseGene expressionDrug Discovery1312 Molecular BiologyAnimalsColoring AgentsMolecular BiologyNeuroinflammationInflammationPharmacologyAmyloid beta-PeptidesDose-Response Relationship DrugMolecular Structure010405 organic chemistry3002 Drug DiscoveryBrainSmall moleculeMolecular medicine0104 chemical sciencesCell biologyMice Inbred C57BL3004 Pharmacology10036 Medical Clinic1313 Molecular Medicine570 Life sciences; biologyMolecular MedicineFemaleAzo Compounds
researchProduct

Automated quality control of next generation sequencing data using machine learning

2019

AbstractControlling quality of next generation sequencing (NGS) data files is a necessary but complex task. To address this problem, we statistically characterized common NGS quality features and developed a novel quality control procedure involving tree-based and deep learning classification algorithms. Predictive models, validated on internal data and external disease diagnostic datasets, are to some extent generalizable to data from unseen species. The derived statistical guidelines and predictive models represent a valuable resource for users of NGS data to better understand quality issues and perform automatic quality control. Our guidelines and software are available at the following …

business.industryComputer sciencemedia_common.quotation_subjectDeep learningMachine learningcomputer.software_genreDNA sequencingStatistical classificationTree (data structure)Task (computing)SoftwareResource (project management)Data fileQuality (business)Artificial intelligencebusinesscomputermedia_common
researchProduct

TAF-ChIP: An ultra-low input approach for genome wide chromatin immunoprecipitation assay

2018

Chromatin immunoprecipitation (ChIP) followed by next generation sequencing is an invaluable and powerful technique to understand transcriptional regulation. However, ChIP is currently limited by the requirement of large amount of starting material. This renders studying rare cell populations very challenging, or even impossible. Here, we present a tagmentation-assisted fragmentation ChIP (TAF-ChIP) and sequencing method to generate high-quality datasets from low cell numbers. The method relies on Tn5 transposon activity to fragment the chromatin that is immunoprecipitated, thus circumventing the need for sonication or MNAse digestion to fragment. Furthermore, Tn5 adds the sequencing adapto…

Transposable elementCell typebiologyComputer scienceImmunoprecipitationCellGenomicsComputational biologyENCODEGenomeDNA sequencingChromatinmedicine.anatomical_structureTranscriptional regulationbiology.proteinmedicineH3K4me3EpigeneticsChromatin immunoprecipitationMicrococcal nuclease
researchProduct

Flanking regions determine the structure of the poly-glutamine homo- repeat in huntingtin through mechanisms common among glutamine-rich human protei…

2020

International audience; The causative agent of Huntington's disease, the poly-Q homo-repeat in the N-terminal region of huntingtin (httex1), is flanked by a 17-residue-long fragment (N17) and a proline-rich region (PRR), which promote and inhibit the aggregation propensity of the protein, respectively, by poorly understood mechanisms. Based on experimental data obtained from site-specifically labeled NMR samples, we derived an ensemble model of httex1 that identified both flanking regions as opposing poly-Q secondary structure promoters. While N17 triggers helicity through a promiscuous hydrogen bond network involving the side chains of the first glutamines in the poly-Q tract, the PRR prom…

Repetitive Sequences Amino AcidHuntingtinAmino Acid Motifs[SDV.BBM.BP] Life Sciences [q-bio]/Biochemistry Molecular Biology/Biophysics03 medical and health sciencesHuntington's diseaseStructural BiologyHuman proteome projectmedicineHumans[SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry Molecular Biology/Biochemistry [q-bio.BM]Molecular BiologyHuman proteinsProtein secondary structure[SDV.BBM.BC] Life Sciences [q-bio]/Biochemistry Molecular Biology/Biochemistry [q-bio.BM]030304 developmental biology[INFO.INFO-BI] Computer Science [cs]/Bioinformatics [q-bio.QM]Huntingtin Protein0303 health sciencesChemistry030302 biochemistry & molecular biologyPromotermedicine.diseaseCell biologyIntrinsically Disordered ProteinsGlutamine[SDV.BBM.BP]Life Sciences [q-bio]/Biochemistry Molecular Biology/BiophysicsPolyglutamic Acid[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Low Complexity Region
researchProduct

A novel approach to investigate the evolution of structured tandem repeat protein families by exon duplication.

2020

Tandem Repeat Proteins (TRPs) are ubiquitous in cells and are enriched in eukaryotes. They contributed to the evolution of organism complexity, specializing for functions that require quick adaptability such as immunity-related functions. To investigate the hypothesis of repeat protein evolution through exon duplication and rearrangement, we designed a tool to analyze the relationships between exon/intron patterns and structural symmetries. The tool allows comparison of the structure fragments as defined by exon/intron boundaries from Ensembl against the structural element repetitions from RepeatsDB. The all-against-all pairwise structural alignment between fragments and comparison of the t…

Protein familyStructural alignmentBiological data visualizationExonComputational biologyBiologyEvolution Molecular03 medical and health sciencesExonProtein structureTandem repeatStructural BiologyGene duplicationAnimalsHumans030304 developmental biology0303 health sciences030302 biochemistry & molecular biologyIntronProteinsExonsProtein superfamilyClassificationIntronsBiological data visualization; Classification; Exon; Protein evolution; Protein structure; Repeat proteinTandem Repeat SequencesRepeat proteinProtein structureProtein evolutionJournal of structural biology
researchProduct

Evolution-guided evaluation of the inverted terminal repeats of the synthetic transposon Sleeping Beauty.

2018

Abstract Sleeping Beauty (SB) is a synthetic Tc1/mariner transposon that is widely used for genetic engineering in vertebrates, including humans. Its sequence was derived from a consensus of sequences found in fish species including the Atlantic salmon (Salmo salar). One of the functional components of SB, the transposase enzyme, has been subject to extensive mutagenesis yielding hyperactive protein variants for advanced applications. The second functional component, the transposon inverted terminal repeats (ITRs), has so far not been extensively modified, mainly due to a lack of natural sequence information. Importantly, as genome sequences become available, they can provide a rich source …

Recombination Geneticlcsh:RSalmo salarTerminal Repeat Sequenceslcsh:MedicineComputational BiologyArticle570 Life sciencesDNA Transposable ElementsAnimalsHumanslcsh:Qlcsh:ScienceGenetic EngineeringMolecular Biology570 BiowissenschaftenScientific reports
researchProduct

TAF-ChIP: an ultra-low input approach for genome-wide chromatin immunoprecipitation assay

2019

The authors present a novel method for obtaining chromatin profiles from low cell numbers without prior nuclei isolation. The method is successfully implemented in generating epigenetic profile from 100 cells with high signal-to-noise ratio.

Health Toxicology and MutagenesisPlant ScienceComputational biologySignal-To-Noise RatioBiochemistry Genetics and Molecular Biology (miscellaneous)GenomeDNA sequencingEpigenesis GeneticHistones03 medical and health sciences0302 clinical medicineTranscriptional regulationMethodsAnimalsHumansEpigenetics030304 developmental biologyWhole genome sequencing0303 health sciencesEcologybiologyWhole Genome SequencingChemistryHigh-Throughput Nucleotide SequencingChip11Histonebiology.proteinChromatin Immunoprecipitation SequencingDrosophilaK562 CellsChromatin immunoprecipitation030217 neurology & neurosurgerySoftwareLife Science Alliance
researchProduct

The latent geometry of the human protein interaction network

2017

Abstract Motivation A series of recently introduced algorithms and models advocates for the existence of a hyperbolic geometry underlying the network representation of complex systems. Since the human protein interaction network (hPIN) has a complex architecture, we hypothesized that uncovering its latent geometry could ease challenging problems in systems biology, translating them into measuring distances between proteins. Results We embedded the hPIN to hyperbolic space and found that the inferred coordinates of nodes capture biologically relevant features, like protein age, function and cellular localization. This means that the representation of the hPIN in the two-dimensional hyperboli…

0301 basic medicineStatistics and ProbabilityGeometric analysisComputer scienceHyperbolic geometrySystems biologyComplex systemContext (language use)GeometryBiochemistryProtein–protein interaction03 medical and health sciencesInteraction networkHumansProtein Interaction MapsRepresentation (mathematics)Cluster analysisMolecular BiologySystems BiologyHyperbolic spaceProteinsFunction (mathematics)Original PapersComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicsEmbeddingSignal transductionAlgorithmsSignal Transduction
researchProduct

Nuclear inclusions of pathogenic ataxin-1 induce oxidative stress and perturb the protein synthesis machinery

2020

Spinocerebellar ataxia type-1 (SCA1) is caused by an abnormally expanded polyglutamine (polyQ) tract in ataxin-1. These expansions are responsible for protein misfolding and self-assembly into intranuclear inclusion bodies (IIBs) that are somehow linked to neuronal death. However, owing to lack of a suitable cellular model, the downstream consequences of IIB formation are yet to be resolved. Here, we describe a nuclear protein aggregation model of pathogenic human ataxin-1 and characterize IIB effects. Using an inducible Sleeping Beauty transposon system, we overexpressed the ATXN1(Q82) gene in human mesenchymal stem cells that are resistant to the early cytotoxic effects caused by the expr…

0301 basic medicineSCA1 Spinocerebellar ataxia type-1Intranuclear Inclusion BodiesClinical BiochemistryMSC mesenchymal stem cellProtein aggregationBiochemistry0302 clinical medicineMutant proteinProtein biosynthesisDE differentially expressed genesNuclear proteinlcsh:QH301-705.5FTIR Fourier-transform infrared spectroscopyAtaxin-1lcsh:R5-920biologyChemistryNuclear ProteinspolyQ polyglutamineRibosomeCell biologySB Sleeping BeautyRibosome ; Polyglutamine ; Ataxin-1 ; Oxidative stress ; Transposon ; Sleeping beauty transposon ; Protein networkSpinocerebellar ataxiaProtein foldingCellular modelFunction and Dysfunction of the Nervous Systemlcsh:Medicine (General)Research PaperiPSC induced pluripotent stem cellAtaxin 1Nerve Tissue ProteinsPPI protein-protein interaction03 medical and health sciencesROS reactive oxygen speciesProtein networkSleeping beauty transposonGSEA Gene Set Enrichment AnalysismedicineHumansNPC neural progenitor cellOrganic Chemistrymedicine.diseaseAFM atomic force microscopyOxidative Stress030104 developmental biologylcsh:Biology (General)IIBs intranuclear inclusion bodiesMS mass spectrometryCardiovascular and Metabolic Diseasesbiology.proteinPolyglutamine030217 neurology & neurosurgery
researchProduct

Gene Set to Diseases (GS2D): disease enrichment analysis on human gene sets with literature data

2016

Large sets of candidate genes derived from high-throughput biological experiments can be characterized by functional enrichment analysis. The analysis consists of comparing the functions of one gene set against that of a background gene set. Then, functions related to a significant number of genes in the gene set are expected to be relevant. Web tools offering disease enrichment analysis on gene sets are often based on gene-disease associations from manually curated or experimental data that is accurate but does not cover all diseases discussed in the literature. Using associations automatically derived from literature data could be a cost effective method to improve the coverage of disease…

Candidate genebusiness.industryBig dataExperimental dataGenomicsBiologycomputer.software_genreSet (abstract data type)WorkflowData miningToxicogenomicsbusinesscomputerGeneGenomics and Computational Biology
researchProduct

Expression and subcellular localization of USH1C/harmonin in the human retina provide insights into pathomechanisms and therapy

2021

AbstractUsher syndrome (USH) is the most common form of hereditary deafness-blindness in humans. USH is a complex genetic disorder, assigned to three clinical subtypes differing in onset, course, and severity, with USH1 being the most severe. Rodent USH1 models do not reflect the ocular phenotype observed in human patients to date; hence, little is known about the pathophysiology of USH1 in the human eye. One of the USH1 genes, USH1C, exhibits extensive alternative splicing and encodes numerous harmonin protein isoforms that function as scaffolds for organizing the USH interactome. RNA-seq analysis of human retinas uncovered harmonin_a1 as the most abundant transcript of USH1C. Bulk RNA-seq…

Scaffold proteinGene isoformRetinabiologyUsher syndromeCiliummedicine.diseasePhenotypeCell biologymedicine.anatomical_structureRhodopsinotorhinolaryngologic diseasesmedicinebiology.proteinMuller glia
researchProduct

Text mining of biomedical literature: doing well, but we could be doing better.

2015

Information retrievalbusiness.industryComputer scienceMEDLINEComputational BiologyBiomedical text miningGeneral Biochemistry Genetics and Molecular BiologyText miningCopyrightData MiningHumansPeriodicals as TopicbusinessMolecular BiologyIntroductory Journal ArticleMethods (San Diego, Calif.)
researchProduct

The Role of Low Complexity Regions in Protein Interaction Modes: An Illustration in Huntingtin

2021

Low complexity regions (LCRs) are very frequent in protein sequences, generally having a lower propensity to form structured domains and tending to be much less evolutionarily conserved than globular domains. Their higher abundance in eukaryotes and in species with more cellular types agrees with a growing number of reports on their function in protein interactions regulated by post-translational modifications. LCRs facilitate the increase of regulatory and network complexity required with the emergence of organisms with more complex tissue distribution and development. Although the low conservation and structural flexibility of LCRs complicate their study, evolutionary studies of proteins …

Protein Conformation alpha-Helical0301 basic medicineNetwork complexityHuntingtinintrinsically disordered regionsAmino Acid MotifsComputational biologyBiologyprotein interactionsArticlecompositionally biased regionsCatalysisProtein–protein interactionlcsh:ChemistryEvolution MolecularInorganic ChemistryLow complexity03 medical and health sciencesProtein DomainsProtein Interaction MappingAnimalsHumansp300-CBP Transcription FactorsAmino Acid SequenceProtein Interaction MapsHuntingtinTissue distributionPhysical and Theoretical Chemistrylcsh:QH301-705.5Molecular BiologySpectroscopyHuntingtin Protein030102 biochemistry & molecular biologyOrganic ChemistryNuclear Proteinsp120 GTPase Activating ProteinGeneral MedicineMultiple modesSynapsinslow complexity regionsComputer Science ApplicationshomorepeatsMicroscopy Electron030104 developmental biologylcsh:Biology (General)lcsh:QD1-999Sequence AlignmentFunction (biology)Protein BindingInternational Journal of Molecular Sciences
researchProduct

Editorial: Protein Interaction Networks in Health and Disease

2016

The identification and annotation of protein-protein interactions (PPIs) is of great importance in systems biology. Big data produced from experimental or computational approaches allow not only the construction of large protein interaction maps but also expand our knowledge on how proteins build up molecular complexes to perform sophisticated tasks inside a cell. However, if we want to accurately understand the functionality of these complexes, we need to go beyond the simple identification of PPIs. We need to know when and where an interaction happens in the cell and also understand the flow of information through a protein interaction network. Another perspective of the research on PPI n…

0301 basic medicineprotein networkdiseasePhysiologySystems biologyCellular homeostasissystems biologyComputational biologyprotein functionBiologyProteomicscomputer.software_genreprotein interactionsInteractomeProtein–protein interaction03 medical and health sciences030104 developmental biologyHuman interactomeInteraction networkGeneticsMolecular MedicineData miningcomputerGenetics (clinical)Biological networkFrontiers in Genetics
researchProduct

CRISPR sequences are sometimes erroneously translated and can contaminate public databases with spurious proteins containing spaced repeats

2020

© The Author(s) 2020.

Computer scienceGene predictionGenomicscomputer.software_genreGeneral Biochemistry Genetics and Molecular BiologyHomology (biology)03 medical and health sciencesAnnotation0302 clinical medicineCRISPRClustered Regularly Interspaced Short Palindromic RepeatsDatabases Protein030304 developmental biology0303 health sciencesDatabasePalindromeProteinsComputational geneGenomicsAcademicSubjects/SCI00960Original ArticleUniProtGeneral Agricultural and Biological Sciencescomputer030217 neurology & neurosurgeryInformation Systems
researchProduct

Myeloid leukemia with transdifferentiation plasticity developing from T-cell progenitors

2016

Unfavorable patient survival coincides with lineage plasticity observed in human acute leukemias. These cases are assumed to arise from hematopoietic stem cells, which have stable multipotent differentiation potential. However, here we report that plasticity in leukemia can result from instable lineage identity states inherited from differentiating progenitor cells. Using mice with enhanced c-Myc expression, we show, at the single-cell level, that T-lymphoid progenitors retain broad malignant lineage potential with a high capacity to differentiate into myeloid leukemia. These T-cell-derived myeloid blasts retain expression of a defined set of T-cell transcription factors, creating a lymphoi…

0301 basic medicineMyeloidBone Marrow CellsBiologyGeneral Biochemistry Genetics and Molecular Biology03 medical and health scienceshemic and lymphatic diseasesmedicineCell LineageProgenitor cellMolecular BiologyGeneral Immunology and MicrobiologyGeneral NeuroscienceTransdifferentiationMyeloid leukemiaCell DifferentiationArticlesmedicine.diseaseHematopoietic Stem CellsHaematopoiesisLeukemia030104 developmental biologymedicine.anatomical_structureImmunologyCancer researchLymphoid Progenitor CellsStem cell
researchProduct

AnABlast: Re-searching for Protein-Coding Sequences in Genomic Regions

2019

AnABlast is a computational tool that highlights protein-coding regions within intergenic and intronic DNA sequences which escape detection by standard gene prediction algorithms. DNA sequences with small protein-coding genes or exons, complex intron-containing genes, or degenerated DNA fragments are efficiently targeted by AnABlast. Furthermore, this algorithm is particularly useful in detecting protein-coding sequences with nonsignificant homologs to sequences in databases. AnABlast can be executed online at http://www.bioinfocabd.upo.es/anablast/ .

Fossil DNA sequencesProtein coding0303 health sciencesGene predictionCoding DNA sequences030302 biochemistry & molecular biologyComputational biologyBiologyGene findingDNA sequencing03 medical and health sciencesExonchemistry.chemical_compoundIntergenic regionchemistryHomologous chromosomeSmall genesGeneIn silico annotation toolDNA030304 developmental biology
researchProduct

A reliable and unbiased human protein network with the disparity filter

2017

AbstractThe living cell operates thanks to an intricate network of protein interactions. Proteins activate, transport, degrade, stabilise and participate in the production of other proteins. As a result, a reliable and systematically generated protein wiring diagram is crucial for a deeper understanding of cellular functions. Unfortunately, current human protein networks are noisy and incomplete. Also, they suffer from both study and technical biases: heavily studied proteins (e.g. those of pharmaceutical interest) are known to be involved in more interactions than proteins described in only a few publications. Here, we use the experimental evidence supporting the interaction between protei…

ComputingMethodologies_PATTERNRECOGNITIONHuman interactomeFilter (video)Cellular functionsHuman proteome projectLiving cellComputational biologyBiologyBioinformaticsProtein networkProtein–protein interaction
researchProduct

Prediction of Chromatin Accessibility in Gene-Regulatory Regions from Transcriptomics Data

2017

AbstractThe epigenetics landscape of cells plays a key role in the establishment of cell-type specific gene expression programs characteristic of different cellular phenotypes. Different experimental procedures have been developed to obtain insights into the accessible chromatin landscape including DNase-seq, FAIRE-seq and ATAC-seq. However, current downstream computational tools fail to reliably determine regulatory region accessibility from the analysis of these experimental data. In particular, currently available peak calling algorithms are very sensitive to their parameter settings and show highly heterogeneous results, which hampers a trustworthy identification of accessible chromatin…

0301 basic medicineScienceComputational biologyRegulatory Sequences Nucleic AcidBiologycomputer.software_genreArticleEpigenesis Genetic03 medical and health sciencesDatabases GeneticHumansEpigeneticsComputational modelDeoxyribonucleasesMultidisciplinarySequence Analysis RNAGene Expression ProfilingDecision tree learningQRSequence Analysis DNAChromatinChromatinGene expression profilingIdentification (information)030104 developmental biologyGene Expression RegulationMedicineData miningPrecision and recallPeak callingcomputerAlgorithmsScientific reports
researchProduct

CellMap visualizes protein-protein interactions and subcellular localization

2018

Many tools visualize protein-protein interaction (PPI) networks. The tool introduced here, CellMap, adds one crucial novelty by visualizing PPI networks in the context of subcellular localization, i.e. the location in the cell or cellular component in which a PPI happens. Users can upload images of cells and define areas of interest against which PPIs for selected proteins are displayed (by default on a cartoon of a cell). Annotations of localization are provided by the user or through our in-house database. The visualizer and server are written in JavaScript, making CellMap easy to customize and to extend by researchers and developers.

0301 basic medicineBioinformaticssubcellular locationContext (language use)BiologyJavaScriptGeneral Biochemistry Genetics and Molecular BiologyChemical Biology of the CellProtein–protein interactionprotein-protein interaction03 medical and health sciencesUploadHuman–computer interactionGeneral Pharmacology Toxicology and Pharmaceuticscomputer.programming_languagebiological visualization030102 biochemistry & molecular biologyGeneral Immunology and MicrobiologySoftware Tool ArticleNoveltyArticlesGeneral MedicineSubcellular localizationddc:ComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyNeurosciencecomputerF1000Research
researchProduct

Zc3h13/Flacc is required for adenosine methylation by bridging the mRNA-binding factor Rbm15/Spenito to the m6A machinery component Wtap/Fl(2)d

2018

N6-methyladenosine (m6A) is the most abundant mRNA modification in eukaryotes, playing crucial roles in multiple biological processes. m6A is catalyzed by the activity of methyltransferase-like 3 (Mettl3), which depends on additional proteins whose precise functions remain poorly understood. Here we identified Zc3h13 (zinc finger CCCH domain-containing protein 13)/Flacc [Fl(2)d-associated complex component] as a novel interactor of m6A methyltransferase complex components in Drosophila and mice. Like other components of this complex, Flacc controls m6A levels and is involved in sex determination in Drosophila. We demonstrate that Flacc promotes m6A deposition by bridging Fl(2)d to the mRNA-…

0301 basic medicineZinc fingerMethyltransferase complexMRNA modificationRNA-binding proteinMethylationBiologyDNA-binding proteinCell biology03 medical and health sciences030104 developmental biologyFLACC scaleGeneticsDrosophila ProteinDevelopmental BiologyGenes & Development
researchProduct

Drivers of topoisomerase II poisoning mimic and complement cytotoxicity in AML cells

2019

Recently approved cancer drugs remain out-of-reach to most patients due to prohibitive costs and only few produce clinically meaningful benefits. An untapped alternative is to enhance the efficacy and safety of existing cancer drugs. We hypothesized that the response to topoisomerase II poisons, a very successful group of cancer drugs, can be improved by considering treatment-associated transcript levels. To this end, we analyzed transcriptomes from Acute Myeloid Leukemia (AML) cell lines treated with the topoisomerase II poison etoposide. Using complementary criteria of co-regulation within networks and of essentiality for cell survival, we identified and functionally confirmed 11 druggabl…

biologyCombination therapybusiness.industryTopoisomeraseMyeloid leukemiatopoisomerase II poisonscombination therapyCell killingOncologygene expressioncancer essentialitybiology.proteinmedicineCancer researchDNA damageCytotoxic T cellCytotoxicitybusinessEtoposidePI3K/AKT/mTOR pathwayResearch Papermedicine.drugOncotarget
researchProduct

Comprehensive translational control of tyrosine kinase expression by upstream open reading frames

2016

Post-transcriptional control has emerged as a major regulatory event in gene expression and often occurs at the level of translation initiation. Although overexpression or constitutive activation of tyrosine kinases (TKs) through gene amplification, translocation or mutation are well-characterized oncogenic events, current knowledge about translational mechanisms of TK activation is scarce. Here, we report the presence of translational cis-regulatory upstream open reading frames (uORFs) in the majority of transcript leader sequences of human TK mRNAs. Genetic ablation of uORF initiation codons in TK transcripts resulted in enhanced translation of the associated downstream main protein-codin…

0301 basic medicineCancer ResearchFive prime untranslated regionKozak consensus sequenceShort CommunicationBiologymedicine.disease_causeProto-Oncogene MasGene Expression Regulation Enzymologic03 medical and health sciencesOpen Reading FramesEukaryotic translationUpstream open reading frameGeneticsmedicineHumansGene Regulatory NetworksMolecular BiologyGeneticsMutationGene Expression ProfilingTranslation (biology)Protein-Tyrosine KinasesOpen reading frame030104 developmental biologyHEK293 CellsProtein BiosynthesisHuman genomeHeLa Cells
researchProduct

The 18S ribosomal RNA m 6 A methyltransferase Mettl5 is required for normal walking behavior in Drosophila

2020

RNA modifications have recently emerged as an important layer of gene regulation. N6-methyladenosine (m6A) is the most prominent modification on eukaryotic messenger RNA and has also been found on noncoding RNA, including ribosomal and small nuclear RNA. Recently, several m6A methyltransferases were identified, uncovering the specificity of m6A deposition by structurally distinct enzymes. In order to discover additional m6A enzymes, we performed an RNAi screen to deplete annotated orthologs of human methyltransferase-like proteins (METTLs) in Drosophila cells and identified CG9666, the ortholog of human METTL5. We show that CG9666 is required for specific deposition of m6A on 18S ribosomal …

AdenosineBiochimiem 6 AMettl5WalkingBiologyBiochemistryRibosome18S ribosomal RNA03 medical and health sciences0302 clinical medicineGene expressionRNA Ribosomal 18SGeneticsAnimalsHumansRNA methyltransferase[SDV.BDD]Life Sciences [q-bio]/Development BiologyMolecular Biology030304 developmental biologyBehavior0303 health sciencesMessenger RNAbehaviorBiologie moléculaireRNA[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyMethyltransferasesm6ARibosomal RNANon-coding RNARibosome[SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry Molecular Biology/Biomolecules [q-bio.BM]3. Good healthCell biologyribosomeRNA RibosomalDrosophilaBiologie030217 neurology & neurosurgerySmall nuclear RNAReportsEMBO reports
researchProduct

Comparison of inter- and intraspecies variation in humans and fruit flies

2015

AbstractVariation is essential to species survival and adaptation during evolution. This variation is conferred by the imperfection of biochemical processes, such as mutations and alterations in DNA sequences, and can also be seen within genomes through processes such as the generation of antibodies. Recent sequencing projects have produced multiple versions of the genomes of humans and fruit flies (Drosophila melanogaster). These give us a chance to study how individual gene sequences vary within and between species. Here we arranged human and fly genes in orthologous pairs and compared such within-species variability with their degree of conservation between flies and humans. We observed …

Cancer Researchlcsh:QH426-470EvolutionPopulationPopulationVariationBiochemistryGenomeDNA sequencingGeneticseducationGeneDrosophilaGeneticseducation.field_of_studyHuman genomebiologyRegular Articlebiology.organism_classificationlcsh:GeneticsMolecular MedicineDrosophilaHuman genomeDrosophila melanogasterAdaptationBiotechnologyGenomics Data
researchProduct

Repeatability in protein sequences

2019

Low complexity regions (LCRs) in protein sequences have special properties that are very different from those of globular proteins. The rules that define secondary structure elements do not apply when the distribution of amino acids becomes biased. While there is a tendency towards structural disorder in LCRs, various examples, and particularly homorepeats of single amino acids, suggest that very short repeats could adopt structures very difficult to predict. These structures are possibly variable and dependant on the context of intra- or inter-molecular interactions. In general, short repeats in LCRs can induce structure. This could explain the observation that very short (non-perfect) rep…

Repetitive Sequences Amino AcidGlobular proteinSaccharomyces cerevisiaeContext (language use)Computational biologyProtein–protein interactionEvolution Molecular03 medical and health sciencesSequence Analysis ProteinStructural BiologyHumansArabidopsis thalianaAmino Acid SequenceDatabases ProteinProtein secondary structure030304 developmental biologychemistry.chemical_classification0303 health sciencesbiology030302 biochemistry & molecular biologyProteinsbiology.organism_classificationAmino acidchemistrySequence AlignmentAlgorithmsFunction (biology)Journal of Structural Biology
researchProduct

Defining Human Tyrosine Kinase Phosphorylation Networks Using Yeast as an In Vivo Model Substrate.

2017

Systematic assessment of tyrosine kinase-substrate relationships is fundamental to a better understanding of cellular signaling and its profound alterations in human diseases such as cancer. In human cells, such assessments are confounded by complex signaling networks, feedback loops, conditional activity, and intra-kinase redundancy. Here we address this challenge by exploiting the yeast proteome as an in vivo model substrate. We individually expressed 16 human non-receptor tyrosine kinases (NRTKs) in Saccharomyces cerevisiae and identified 3,279 kinase-substrate relationships involving 1,351 yeast phosphotyrosine (pY) sites. Based on the yeast data without prior information, we generated …

0301 basic medicineCell signalingHistologySaccharomyces cerevisiae ProteinsSaccharomyces cerevisiaeAmino Acid MotifsSaccharomyces cerevisiaeInteractomeReceptor tyrosine kinaseArticlePathology and Forensic Medicine03 medical and health scienceschemistry.chemical_compoundHumansProtein Interaction MapsPhosphorylationbiologyTyrosine phosphorylationCell BiologyProtein-Tyrosine Kinasesbiology.organism_classificationYeastCell biology030104 developmental biologychemistrybiology.proteinPhosphorylationTyrosine kinaseSequence AlignmentCell systems
researchProduct

Identification of transcribed protein coding sequence remnants within lincRNAs

2018

Abstract Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. However, very little is known of the mechanisms for the evolutionary biogenesis of these RNA elements, especially given their poor conservation across species. It has been proposed that lincRNAs might arise from pseudogenes. To test this systematically, we developed a novel method that searches for remnants of protein-coding sequences within lincRNA transcripts; the hypothesis is that we can t…

0301 basic medicineTransposable elementSequence analysisPseudogeneRetrotransposonComputational biologyBiologyOpen Reading Frames03 medical and health sciences0302 clinical medicineIntergenic regionSequence Analysis ProteinGeneticsHumansAmino Acid SequenceGeneRegulation of gene expressionBase SequenceSequence Analysis RNAComputational Biology030104 developmental biologyGene Expression RegulationDNA IntergenicRNA Long NoncodingSequence AlignmentAlgorithms030217 neurology & neurosurgeryBiogenesisNucleic Acids Research
researchProduct

The Conservation of Low Complexity Regions in Bacterial Proteins Depends on the Pathogenicity of the Strain and Subcellular Location of the Protein

2021

Low complexity regions (LCRs) in proteins are characterized by amino acid frequencies that differ from the average. These regions evolve faster and tend to be less conserved between homologs than globular domains. They are not common in bacteria, as compared to their prevalence in eukaryotes. Studying their conservation could help provide hypotheses about their function. To obtain the appropriate evolutionary focus for this rapidly evolving feature, here we study the conservation of LCRs in bacterial strains and compare their high variability to the closeness of the strains. For this, we selected 20 taxonomically diverse bacterial species and obtained the completely sequenced proteomes of t…

Proteomics0301 basic medicinelcsh:QH426-470030106 microbiologyBiologyArticlecompositionally biased regionsEvolution MolecularLow complexity03 medical and health sciencesBacterial ProteinsSequence Analysis ProteinGeneticsExtracellularGenetics (clinical)chemistry.chemical_classificationBacteriaVirulenceStrain (chemistry)Computational Biologybiology.organism_classificationlow complexity regionsAmino acidhomorepeatslcsh:Genetics030104 developmental biologychemistryEvolutionary biologybacterial strainsProteomeorthologyBacterial outer membraneBacteriaFunction (biology)host–pathogen interactionsGenes
researchProduct

RNA Sequencing of Human Peripheral Blood Cells Indicates Upregulation of Immune-Related Genes in Huntington's Disease

2020

Huntington's disease (HD) is an autosomal dominantly inherited neurodegenerative disorder caused by a trinucleotide repeat expansion in the Huntingtin gene. As disease-modifying therapies for HD are being developed, peripheral blood cells may be used to indicate disease progression and to monitor treatment response. In order to investigate whether gene expression changes can be found in the blood of individuals with HD that distinguish them from healthy controls, we performed transcriptome analysis by next-generation sequencing (RNA-seq). We detected a gene expression signature consistent with dysregulation of immune-related functions and inflammatory response in peripheral blood from HD ca…

inflammationHuntington's diseaseRNA-Seqdifferential gene expressiondisease markerslcsh:Neurology. Diseases of the nervous systemlcsh:RC346-429Frontiers in Neurology
researchProduct

m6A modulates neuronal functions and sex determination in Drosophila

2016

N6-methyladenosine RNA (m6A) is a prevalent messenger RNA modification in vertebrates. Although its functions in the regulation of post-transcriptional gene expression are beginning to be unveiled, the precise roles of m6A during development of complex organisms remain unclear. Here we carry out a comprehensive molecular and physiological characterization of the individual components of the methyltransferase complex, as well as of the YTH domain-containing nuclear reader protein in Drosophila melanogaster. We identify the member of the split ends protein family, Spenito, as a novel bona fide subunit of the methyltransferase complex. We further demonstrate important roles of this complex in …

0301 basic medicineGeneticsMessenger RNAMultidisciplinarybiologyProtein familyMethyltransferase complexEffectorRNA-binding proteinbiology.organism_classificationCell biology03 medical and health sciences030104 developmental biology0302 clinical medicineNuclear proteinDrosophila melanogaster030217 neurology & neurosurgeryDrosophila ProteinNature
researchProduct

Disease–Genes Must Guide Data Source Integration in the Gene Prioritization Process

2019

One of the main issues in detecting the genes involved in the etiology of genetic human diseases is the integration of different types of available functional relationships between genes. Numerous approaches exploited the complementary evidence coded in heterogeneous sources of data to prioritize disease-genes, such as functional profiles or expression quantitative trait loci, but none of them to our knowledge posed the scarcity of known disease-genes as a feature of their integration methodology. Nevertheless, in contexts where data are unbalanced, that is, where one class is largely under-represented, imbalance-unaware approaches may suffer a strong decrease in performance. We claim that …

0301 basic medicineClass (computer programming)Boosting (machine learning)Computer scienceProcess (engineering)media_common.quotation_subjectComputational biologyScarcity03 medical and health sciencesComputingMethodologies_PATTERNRECOGNITION030104 developmental biologyExpression quantitative trait lociKey (cryptography)Feature (machine learning)Gene prioritizationmedia_common
researchProduct

7C: Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs.

2019

Abstract Background Knowledge of the three-dimensional structure of the genome is necessary to understand how gene expression is regulated. Recent experimental techniques such as Hi-C or ChIA-PET measure long-range chromatin interactions genome-wide but are experimentally elaborate, have limited resolution and such data is only available for a limited number of cell types and tissues. Results While ChIP-seq was not designed to detect chromatin interactions, the formaldehyde treatment in the ChIP-seq protocol cross-links proteins with each other and with DNA. Consequently, also regions that are not directly bound by the targeted TF but interact with the binding site via chromatin looping are…

CCCTC-Binding Factorlcsh:QH426-470Protein Conformationlcsh:Biotechnologygenetic processesComputational biologyBiologyGenomeChromosomesBioconductorChromosome conformation capture03 medical and health sciences0302 clinical medicine6CHi-Clcsh:TP248.13-248.65GeneticsTranscription factorsHumansnatural sciencesNucleotide Motifs4CChIA-PET030304 developmental biologyChromatin loops0303 health sciencesThree-dimensional genome architectureChromatinChromatinChIP-seq7Clcsh:Genetics5CCTCFChromatin Immunoprecipitation SequencingHuman genomeDNA microarrayChIA-PET3CPrediction030217 neurology & neurosurgeryChromatin interactionsBiotechnologyHeLa CellsResearch ArticleBMC genomics
researchProduct

The distributions of protein coding genes within chromatin domains in relation to human disease.

2019

Abstract Background Our understanding of the nuclear chromatin structure has increased hugely during the last years mainly as a consequence of the advances in chromatin conformation capture methods like Hi-C. The unprecedented resolution of genome-wide interaction maps shows functional consequences that extend the initial thought of an efficient DNA packaging mechanism: gene regulation, DNA repair, chromosomal translocations and evolutionary rearrangements seem to be only the peak of the iceberg. One key concept emerging from this research is the topologically associating domains (TADs) whose functional role in gene regulation and their association with disease is not fully untangled. Resul…

lcsh:QH426-470Computational biologyBiologyChromatin structureCell LineChromosome conformation captureOpen Reading FramesGene expressionDatabases GeneticGeneticsEnhancersHumansDiseaseEnhancerMolecular BiologyGeneRegulation of gene expressionHousekeeping genesTopologically associating domainsResearchHuman diseasesTADGenes associated with diseaseHuman geneticsChromatinChromatinHousekeeping geneGene regulationlcsh:GeneticsEnhancer Elements GeneticTranscription Initiation SiteChromatin interactionsEpigeneticschromatin
researchProduct

Lost Strings in Genomes: What Sense Do They Make?

2017

We studied the sets of avoided strings to be observed over a family of genomes. It was found that the length of the minimal avoided string rarely exceeds 9 nucleotides, with neither respect to a phylogeny of a genome under consideration. The lists of the avoided strings observed over the sets of (related) genomes have been analyzed. Very low correlation between the phylogeny, and the set of those strings has been found.

0301 basic medicineGeneticsanimal structuresgenetic structuresinformation scienceString (physics)GenomeCombinatoricsSet (abstract data type)03 medical and health sciences030104 developmental biology0302 clinical medicinePhylogeneticscardiovascular systemLow correlation030217 neurology & neurosurgerySelection (genetic algorithm)Mathematics
researchProduct

orthoFind Facilitates the Discovery of Homologous and Orthologous Proteins

2015

Finding homologous and orthologous protein sequences is often the first step in evolutionary studies, annotation projects, and experiments of functional complementation. Despite all currently available computational tools, there is a requirement for easy-to-use tools that provide functional information. Here, a new web application called orthoFind is presented, which allows a quick search for homologous and orthologous proteins given one or more query sequences, allowing a recurrent and exhaustive search against reference proteomes, and being able to include user databases. It addresses the protein multidomain problem, searching for homologs with the same domain architecture, and gives a si…

Architecture domainScienceBrute-force searchSequence alignmentComputational biologyBiologyAnnotationDatabases GeneticHomologous chromosomeAnimalsHumansWeb applicationAmino Acid SequenceGeneticsInternetMultidisciplinarySequence Homology Amino Acidbusiness.industryQRProteinsSequence homologyProteomeMedicinebusinessSequence AlignmentSoftwareResearch ArticlePLOS ONE
researchProduct

Missing value imputation in proximity extension assay-based targeted proteomics data

2020

Targeted proteomics utilizing antibody-based proximity extension assays provides sensitive and highly specific quantifications of plasma protein levels. Multivariate analysis of this data is hampered by frequent missing values (random or left censored), calling for imputation approaches. While appropriate missing-value imputation methods exist, benchmarks of their performance in targeted proteomics data are lacking. Here, we assessed the performance of two methods for imputation of values missing completely at random, the previously top-benchmarked ‘missForest’ and the recently published ‘GSimp’ method. Evaluation was accomplished by comparing imputed with remeasured relative concentrations…

ProteomicsMaleMultivariate analysisProtein ExpressionBiochemistryProtein expressionDatabase and Informatics MethodsLimit of DetectionStatisticsMedicine and Health SciencesBiochemical SimulationsImputation (statistics)Immune ResponseMathematicsMultidisciplinaryProteomic DatabasesQREukaryotaBlood ProteinsVenous ThromboembolismPlantsMiddle AgedLegumesTargeted proteomicssymbolsEngineering and TechnologyMedicineFemaleAlgorithmsResearch ArticleQuality ControlAdultScienceImmunologyResearch and Analysis Methodssymbols.namesakeSigns and SymptomsBiasIndustrial EngineeringProtein Concentration AssaysGene Expression and Vector TechniquesMissing value imputationHumansMolecular Biology TechniquesMolecular BiologyAgedInflammationMolecular Biology Assays and Analysis TechniquesInterleukin-6OrganismsPeasBiology and Life SciencesComputational BiologyMissing dataPearson product-moment correlation coefficientBiological DatabasesMultivariate AnalysisClinical MedicineVenous thromboembolismPLOS ONE
researchProduct

dAPE: a web server to detect homorepeats and follow their evolution.

2016

Abstract Summary Homorepeats are low complexity regions consisting of repetitions of a single amino acid residue. There is no current consensus on the minimum number of residues needed to define a functional homorepeat, nor even if mismatches are allowed. Here we present dAPE, a web server that helps following the evolution of homorepeats based on orthology information, using a sensitive but tunable cutoff to help in the identification of emerging homorepeats. Availability and Implementation dAPE can be accessed from http://cbdm-01.zdv.uni-mainz.de/∼munoz/polyx. Supplementary information Supplementary data are available at Bioinformatics online.

0301 basic medicineStatistics and ProbabilityRepetitive Sequences Amino AcidWeb serverInternetComputer sciencecomputer.software_genreBiochemistryApplications NotesComputer Science ApplicationsWorld Wide WebEvolution Molecular03 medical and health sciencesComputational Mathematics030104 developmental biologyComputational Theory and MathematicsAnimalsHumansData miningMolecular BiologycomputerSequence AlignmentSequence AnalysisSoftwareBioinformatics (Oxford, England)
researchProduct

Detection of condition-specific marker genes from RNA-seq data with MGFR

2019

The identification of condition-specific genes is key to advancing our understanding of cell fate decisions and disease development. Differential gene expression analysis (DGEA) has been the standard tool for this task. However, the amount of samples that modern transcriptomic technologies allow us to study, makes DGEA a daunting task. On the other hand, experiments with low numbers of replicates lack the statistical power to detect differentially expressed genes. We have previously developed MGFM, a tool for marker gene detection from microarrays, that is particularly useful in the latter case. Here, we have adapted the algorithm behind MGFM to detect markers in RNA-seq data. MGFR groups s…

Bioinformaticslcsh:MedicineRNA-SeqComputational biologyMarker genesCell fate determinationBiologyMarker geneGeneral Biochemistry Genetics and Molecular BiologyTranscriptomeBioconductor03 medical and health sciences0302 clinical medicineGene expressionSingle cellRNA-SeqTranscriptomicsGene030304 developmental biology0303 health sciencesGeneral Neurosciencelcsh:RCell-type specificityGenomicsGeneral MedicineTissue specificity030220 oncology & carcinogenesisGene expressionR-packageDNA microarrayGeneral Agricultural and Biological SciencesPeerJ
researchProduct

Dynamics of a Protein Interaction Network Associated to the Aggregation of polyQ-Expanded Ataxin-1

2020

Background: Several experimental models of polyglutamine (polyQ) diseases have been previously developed that are useful for studying disease progression in the primarily affected central nervous system. However, there is a missing link between cellular and animal models that would indicate the molecular defects occurring in neurons and are responsible for the disease phenotype in vivo. Methods: Here, we used a computational approach to identify dysregulated pathways shared by an in vitro and an in vivo model of ATXN1(Q82) protein aggregation, the mutant protein that causes the neurodegenerative polyQ disease spinocerebellar ataxia type-1 (SCA1). Results: A set of common dysregulated pathwa…

0301 basic medicinelcsh:QH426-470Ataxin 1Mice TransgenicNerve Tissue ProteinsProtein aggregationBlood–brain barrierblood-brain-barrierArticledrugspolyQ03 medical and health sciences0302 clinical medicineataxin-1Interaction networkIn vivoMutant proteinCerebellumGeneticsmedicineAnimalsGene Regulatory NetworksProtein Interaction MapsGenetics (clinical)NeuronsbiologypathwayGene Expression Profilingmedicine.diseaselcsh:Genetics030104 developmental biologymedicine.anatomical_structureGene Expression Regulationnetworkbiology.proteinSpinocerebellar ataxiaPeptidesNeuroscience030217 neurology & neurosurgeryFunction (biology)Genes
researchProduct

A Methodology to Study Pseudogenized lincRNAs

2021

Long intergenic noncoding RNAs (lincRNAs) are known to be tissue specifically expressed and able to regulate functional protein-coding genes: some can even act as competing endogenous RNAs (ceRNAs), because microRNAs can bind to them instead of the corresponding mRNA binding sites. Some lincRNAs contain remnants of protein-coding sequences and it has been hypothesized that they might arise after a pseudogenization processes. However, a major limitation in the study of such phenomenon is the lack of proper computational tools designed to align/analyze protein-coding sequences and noncoding sequences. To overcome this limitation, we published a method that finds the remnants of protein-coding…

0301 basic medicineCompeting endogenous RNAPseudogeneSequence alignmentComputational biologyBiology03 medical and health sciences030104 developmental biology0302 clinical medicineIntergenic regionmicroRNASingle pointGene030217 neurology & neurosurgerySequence (medicine)
researchProduct

HIPPIE v2.0: Enhancing meaningfulness and reliability of protein-protein interaction networks

2016

The increasing number of experimentally detected interactions between proteins makes it difficult for researchers to extract the interactions relevant for specific biological processes or diseases. This makes it necessary to accompany the large-scale detection of protein-protein interactions (PPIs) with strategies and tools to generate meaningful PPI subnetworks. To this end, we generated the Human Integrated Protein-Protein Interaction rEference or HIPPIE (http://cbdm.uni-mainz.de/hippie/). HIPPIE is a one-stop resource for the generation and interpretation of PPI networks relevant to a specific research question. We provide means to generate highly reliable, context-specific PPI networks …

0301 basic medicineHippieReliability (computer networking)BiologyWeb BrowserBioinformaticsProtein protein interaction networkComputational biology03 medical and health sciences0302 clinical medicineResource (project management)GeneticsHumansDatabase IssueGraph algorithmsProtein Interaction MapsDatabases ProteinResearch questionGraphical user interfacebusiness.industryReproducibility of ResultsData science030104 developmental biologyComputingMethodologies_PATTERNRECOGNITIONProtein interaction mappingbusiness030217 neurology & neurosurgeryProtein Interaction MapSoftware
researchProduct

Visualizing Human Protein‐Protein Interactions and Subcellular Localizations on Cell Images Through CellMap

2020

Visualizing protein data remains a challenging and stimulating task. Useful and intuitive visualization tools may help advance biomolecular and medical research; unintuitive tools may bar important breakthroughs. This protocol describes two use cases for the CellMap (http://cellmap.protein.properties) web tool. The tool allows researchers to visualize human protein-protein interaction data constrained by protein subcellular localizations. In the simplest form, proteins are visualized on cell images that also show protein-protein interactions (PPIs) through lines (edges) connecting the proteins across the compartments. At a glance, this simultaneously highlights spatial constraints that prot…

0303 health sciencesgenetic structuresComputer scienceCells030305 genetics & heredityProteinsA proteinComputational biologyBiochemistryWeb toolProtein subcellular localization predictionVisualizationProtein–protein interaction03 medical and health sciencesImaging Three-DimensionalStructural BiologyProtein Interaction MappingHumansProtocol (object-oriented programming)SoftwareSubcellular Fractions030304 developmental biologyCurrent Protocols in Bioinformatics
researchProduct

MGFM: a novel tool for detection of tissue and cell specific marker genes from microarray gene expression data

2015

Background Identification of marker genes associated with a specific tissue/cell type is a fundamental challenge in genetic and cell research. Marker genes are of great importance for determining cell identity, and for understanding tissue specific gene function and the molecular mechanisms underlying complex diseases. Results We have developed a new bioinformatics tool called MGFM (Marker Gene Finder in Microarray data) to predict marker genes from microarray gene expression data. Marker genes are identified through the grouping of samples of the same type with similar marker gene expression levels. We verified our approach using two microarray data sets from the NCBI’s Gene Expression Omn…

Genetic MarkersCancer ResearchMicroarraysBiologyMarker genesWeb BrowserProteomicsMarker geneBioconductorGeneticsGeneGenetic Association StudiesGeneticsMicroarray analysis techniquesMethodology ArticleGene Expression ProfilingComputational BiologyReproducibility of Results3. Good healthGene expression profilingSamplesGene OntologyGenetic markerOrgan SpecificityDNA microarrayBiotechnologyBMC Genomics
researchProduct

Evolutionary Study of Disorder in Protein Sequences

2020

Intrinsically disordered proteins (IDPs) contain regions lacking intrinsic globular structure (intrinsically disordered regions, IDRs). IDPs are present across the tree of life, with great variability of IDR type and frequency even between closely related taxa. To investigate the function of IDRs, we evaluated and compared the distribution of disorder content in 10,695 reference proteomes, confirming its high variability and finding certain correlation along the Euteleostomi (bony vertebrates) lineage to number of cell types. We used the comparison of orthologs to study the function of disorder related to increase in cell types, observing that multiple interacting subunits of protein comple…

intrinsically disordered regionsortholog comparisonLineage (evolution)High variabilitylcsh:QR1-502comparative genomicsBiologyIntrinsically disordered proteinsBiochemistryArticlelcsh:MicrobiologyEvolution Molecular03 medical and health sciencesSequence Analysis ProteinAnimalsDatabases ProteinMolecular Biology030304 developmental biologyComparative genomics0303 health sciences030302 biochemistry & molecular biologyEvolutionary biologyVertebratesProteomeintrinsically disordered proteinsFunction (biology)Biomolecules
researchProduct

A targeted proteomics investigation of the obesity paradox in venous thromboembolism

2021

Abstract The obesity paradox, the controversial finding that obesity promotes disease development but protects against sequelae in patients, has been observed in venous thromboembolism (VTE). The aim of this investigation was to identify a body mass–related proteomic signature in VTE patients and to evaluate whether this signature mediates the obesity paradox in VTE patients. Data from the Genotyping and Molecular Phenotyping in Venous ThromboEmbolism Project, a prospective cohort study of 693 VTE patients, were analyzed. A combined end point of recurrent VTE or all-cause death was used. Relative quantification of 444 proteins was performed using high-throughput targeted proteomics technolo…

0301 basic medicineOncologyProteomicsmedicine.medical_specialtyDisease030204 cardiovascular system & hematologyThrombosis and Hemostasis03 medical and health sciences0302 clinical medicineRisk FactorsInternal medicinemedicineHumansLectins C-Typecardiovascular diseasesObesityProspective StudiesReceptors ImmunologicProspective cohort studyGenotypingMembrane Glycoproteinsbusiness.industryLeptinHazard ratioHematologyVenous Thromboembolismmedicine.diseaseObesityConfidence interval030104 developmental biologyMatrix Metalloproteinase 2businessObesity paradox
researchProduct

DiseaseLinc: Disease Enrichment Analysis of Sets of Differentially Expressed LincRNAs

2021

Long intergenic non-coding RNAs (LincRNAs) are long RNAs that do not encode proteins. Functional evidence is lacking for most of them. Their biogenesis is not well-known, but it is thought that many lincRNAs originate from genomic duplication of coding material, resulting in pseudogenes, gene copies that lose their original function and can accumulate mutations. While most pseudogenes eventually stop producing a transcript and become erased by mutations, many of these pseudogene-based lincRNAs keep similarity to the parental gene from which they originated, possibly for functional reasons. For example, they can act as decoys for miRNAs targeting the parental gene. Enrichment analysis of fun…

PseudogeneBreast NeoplasmsKaplan-Meier EstimateComputational biologyDiseaseBiologyweb toolENCODEArticleenrichment analysisdiseasesUser-Computer InterfaceIntergenic regionmicroRNAHumansDiseaselcsh:QH301-705.5GeneInternetGene Expression ProfilinglincRNAsGeneral MedicinePrognosisGene Expression Regulation Neoplasticlcsh:Biology (General)FemaleRNA Long NoncodingFunction (biology)BiogenesisCells
researchProduct

Protein expression profiling suggests relevance of noncanonical pathways in isolated pulmonary embolism

2019

Abstract Patients with isolated pulmonary embolism (PE) have a distinct clinical profile from those with deep vein thrombosis (DVT)-associated PE, with more pulmonary conditions and atherosclerosis. These findings suggest a distinct molecular pathophysiology and the potential involvement of alternative pathways in isolated PE. To test this hypothesis, data from 532 individuals from the Genotyping and Molecular Phenotyping of Venous ThromboEmbolism Project, a multicenter prospective cohort study with extensive biobanking, were analyzed. Targeted, high-throughput proteomics, machine learning, and bioinformatic methods were applied to contrast the acute-phase plasma proteomes of isolated PE pa…

MaleProteomeDatasets as TopicComorbidity030204 cardiovascular system & hematologyProteomicsBioinformaticsBiochemistryThrombosis and HemostasisMachine LearningPathogenesis0302 clinical medicineProtein-Arginine Deiminase Type 2Prospective StudiesProtein Interaction MapsProspective cohort study0303 health scienceseducation.field_of_studyVenous ThromboembolismHematologyMiddle AgedThrombosisPhenotypePulmonary embolismProteomeN-AcetylgalactosaminyltransferasesFemaleAdultQuantitative Trait LociImmunologyPopulationInterferon-gamma03 medical and health sciencesInterleukin-15 Receptor alpha SubunitmedicineHumansGlial Cell Line-Derived Neurotrophic FactoreducationAged030304 developmental biologybusiness.industryPulmonary SurfactantsCell BiologyAtherosclerosismedicine.diseaseOxidative StressGene Expression RegulationPulmonary EmbolismTranscriptomebusinessAcute-Phase ProteinsFollow-Up StudiesBlood
researchProduct

LipiDisease: associate lipids to diseases using literature mining

2021

Abstract Summary Lipids exhibit an essential role in cellular assembly and signaling. Dysregulation of these functions has been linked with many complications including obesity, diabetes, metabolic disorders, cancer and more. Investigating lipid profiles in such conditions can provide insights into cellular functions and possible interventions. Hence the field of lipidomics is expanding in recent years. Even though the role of individual lipids in diseases has been investigated, there is no resource to perform disease enrichment analysis considering the cumulative association of a lipid set. To address this, we have implemented the LipiDisease web server. The tool analyzes millions of recor…

Statistics and ProbabilitySupplementary dataWeb serverAcademicSubjects/SCI01060Computer scienceCellular functionsComputational biologyDiseasecomputer.software_genreApplications NotesBiochemistryField (computer science)Computer Science ApplicationsComputational MathematicsComputational Theory and MathematicsLipidomicsData and Text MiningMolecular BiologycomputerBioinformatics
researchProduct

Protein-protein interactions can be predicted using coiled coil co-evolution patterns

2016

AbstractProtein-protein interactions are sometimes mediated by coiled coil structures. The evolutionary conservation of interacting orthologs in different species, along with the presence or absence of coiled coils in them, may help in the prediction of interacting pairs. Here, we illustrate how the presence of coiled coils in a protein can be exploited as a potential indicator for its interaction with another protein with coiled coils. The prediction capability of our strategy improves when restricting our dataset to highly reliable, known protein-protein interactions. Our study of the co-evolution of coiled coils demonstrates that pairs of interacting proteins can be distinguished from no…

0301 basic medicineStatistics and ProbabilityComputational biologyCorrelated evolutionGeneral Biochemistry Genetics and Molecular BiologyProtein Structure SecondaryProtein–protein interactionConserved sequenceEvolution Molecular03 medical and health sciencesProtein-protein interactionModelling and SimulationImmunology and Microbiology(all)Coiled coilGeneticsCoiled coilPhysicsMedicine(all)030102 biochemistry & molecular biologyGeneral Immunology and MicrobiologyAgricultural and Biological Sciences(all)Models GeneticBiochemistry Genetics and Molecular Biology(all)Applied MathematicsA proteinProteinsGeneral Medicine030104 developmental biologyModeling and SimulationGeneral Agricultural and Biological SciencesJournal of Theoretical Biology
researchProduct

Assessing the low complexity of protein sequences via the low complexity triangle.

2020

Background Proteins with low complexity regions (LCRs) have atypical sequence and structural features. Their amino acid composition varies from the expected, determined proteome-wise, and they do not follow the rules of structural folding that prevail in globular regions. One way to characterize these regions is by assessing the repeatability of a sequence, that is, calculating the local propensity of a region to be part of a repeat. Results We combine two local measures of low complexity, repeatability (using the RES algorithm) and fraction of the most frequent amino acid, to evaluate different proteomes, datasets of protein regions with specific features, and individual cases of proteins…

ProteomeProteomesComputer scienceProtein SequencingBiochemistryDatabase and Informatics MethodsSequence Analysis ProteinProtein methodsPeptide sequencechemistry.chemical_classification0303 health sciencesSequenceMultidisciplinary030302 biochemistry & molecular biologyQRGenomicsAmino acidTandem RepeatsProteomeAmino Acid AnalysisMedicineSequence AnalysisResearch ArticleRepetitive Sequences Amino AcidBioinformaticsSequence analysisScienceResearch and Analysis MethodsGenome Complexity03 medical and health sciencesProtein DomainsAmino Acid Sequence AnalysisTandem repeatGeneticsHumansFraction (mathematics)Repeated SequencesAmino Acid SequenceMolecular Biology TechniquesSequencing TechniquesRepresentation (mathematics)Molecular Biology030304 developmental biologyMolecular Biology Assays and Analysis Techniquesbusiness.industryBiology and Life SciencesProteinsComputational BiologyPattern recognitionchemistryGlobular ProteinsArtificial intelligencebusinessPLoS ONE
researchProduct

PlaToLoCo: the first web meta-server for visualization and annotation of low complexity regions in proteins

2020

Abstract Low complexity regions (LCRs) in protein sequences are characterized by a less diverse amino acid composition compared to typically observed sequence diversity. Recent studies have shown that LCRs may co-occur with intrinsically disordered regions, are highly conserved in many organisms, and often play important roles in protein functions and in diseases. In previous decades, several methods have been developed to identify regions with LCRs or amino acid bias, but most of them as stand-alone applications and currently there is no web-based tool which allows users to explore LCRs in protein sequences with additional functional annotations. We aim to fill this gap by providing PlaToL…

Sequence analysisAcademicSubjects/SCI00010Protein domainComputational biologyBiologyDomain (software engineering)Computer graphics03 medical and health sciencesAnnotationProtein DomainsSequence Analysis ProteinGeneticsComputer GraphicsHumansAmino Acids030304 developmental biology0303 health sciencesIntersection (set theory)030302 biochemistry & molecular biologyMembrane ProteinsProteinsMolecular Sequence AnnotationVisualizationMolecular Sequence AnnotationWeb Server IssueSoftwareNucleic Acids Research
researchProduct

Computational Prediction of Position Effects of Apparently Balanced Human Chromosomal Rearrangements.

2017

Interpretation of variants of uncertain significance, especially chromosomal rearrangements in non-coding regions of the human genome, remains one of the biggest challenges in modern molecular diagnosis. To improve our understanding and interpretation of such variants, we used high-resolution three-dimensional chromosomal structural data and transcriptional regulatory information to predict position effects and their association with pathogenic phenotypes in 17 subjects with apparently balanced chromosomal abnormalities. We found that the rearrangements predict disruption of long-range chromatin interactions between several enhancers and genes whose annotated clinical features are strongly …

0301 basic medicineCandidate genediagnosis030105 genetics & heredityMedical and Health SciencescytogeneticsTranslocation Geneticchromosomal translocationChromosome Breakpointschromatin conformationbalanced chromosomal rearrangement2.1 Biological and endogenous factorsChromosomes HumanGenetics(clinical)AetiologyGenetics (clinical)In Situ HybridizationIn Situ Hybridization Fluorescencelong-range effectGeneticsGenetics & HeredityGene RearrangementGenomeChromosome MappingBiological SciencesChromatinPosition effectPhenotypeMedical geneticsHPOHumandistal effectmedicine.medical_specialtyChromosome engineeringchromosomal rearrangement/dk/atira/pure/subjectarea/asjc/1300/1311KaryotypeTranslocationChromosomal rearrangementBiologyChromosomesFluorescenceArticleChromosomal Position Effects03 medical and health sciencesGeneticClinical ResearchmedicineGeneticsHumansGenetic Predisposition to DiseaseGeneGenome HumanHuman GenomeGenetic Variation/dk/atira/pure/subjectarea/asjc/2700/2716030104 developmental biologyGene Expression RegulationHuman genomeclinical geneticsAmerican journal of human genetics
researchProduct

MAGA: A Supervised Method to Detect Motifs From Annotated Groups in Alignments

2020

Multiple sequence alignments are usually phylogenetically driven. They are studied in the framework of evolution. But sometimes, it is interesting to study residue conservation at positions unconstrained by evolutionary rules. We present a supervised method to access a layer of information difficult to appreciate visually when many protein sequences are aligned. This new tool (MAGA; http://cbdm-01.zdv.uni-mainz.de/~munoz/maga/ ) locates positions in multiple sequence alignments differentially conserved in manually defined groups of sequences.

0303 health sciencesmultiple sequence alignmentsSequence analysisComputer science0206 medical engineeringMethods and ProtocolsSequence analysislcsh:Evolution02 engineering and technologyComputational biologyComputer Science Applications03 medical and health sciencesmotif findingcomputational biologyweb servicesGeneticslcsh:QH359-425020602 bioinformaticsEcology Evolution Behavior and Systematics030304 developmental biologyEvolutionary Bioinformatics
researchProduct

Assessment of computational methods for the analysis of single-cell ATAC-seq data

2019

Abstract Background Recent innovations in single-cell Assay for Transposase Accessible Chromatin using sequencing (scATAC-seq) enable profiling of the epigenetic landscape of thousands of individual cells. scATAC-seq data analysis presents unique methodological challenges. scATAC-seq experiments sample DNA, which, due to low copy numbers (diploid in humans), lead to inherent data sparsity (1–10% of peaks detected per cell) compared to transcriptomic (scRNA-seq) data (10–45% of expressed genes detected per cell). Such challenges in data generation emphasize the need for informative features to assess cell heterogeneity at the chromatin level. Results We present a benchmarking framework that …

Epigenomicslcsh:QH426-470Test data generationComputer scienceCellATAC-seqComputational biologyBiologyClusteringTranscriptomeMice03 medical and health scienceschemistry.chemical_compound0302 clinical medicinemedicineAnimalsHumansProfiling (information science)scATAC-seqnatural sciencesEpigeneticsFeature matrixCluster analysislcsh:QH301-705.5GeneTransposaseVisualization030304 developmental biologySparse matrix0303 health sciencesFeaturizationDimensionality reductionResearchComputational BiologySequence Analysis DNADimensionality reductionChromatinBenchmarkinglcsh:Geneticsmedicine.anatomical_structurelcsh:Biology (General)chemistryRegulatory genomicsSingle-Cell AnalysisPeak calling030217 neurology & neurosurgeryDNA
researchProduct

Evaluation of in vivo and in vitro models of toxicity by comparison of toxicogenomics data with the literature.

2017

Toxicity affecting humans is studied by observing the effects of chemical substances in animal organisms (in vivo) or in animal and human cultivated cell lines (in vitro). Toxicogenomics studies collect gene expression profiles and histopathology assessment data for hundreds of drugs and pollutants in standardized experimental designs using different model systems. These data are an invaluable source for analyzing genome-wide drug response in biological systems. However, a problem remains that is how to evaluate the suitability of heterogeneous in vitro and in vivo systems to model the many different aspects of human toxicity. We propose here that a given model system (cell type or animal o…

0301 basic medicineCandidate geneCell typeDrug Evaluation PreclinicalBiologyBioinformaticsToxicogeneticsGeneral Biochemistry Genetics and Molecular BiologyIn vitroRats03 medical and health sciences030104 developmental biologyIn vivoToxicityHepatocytesAnimalsHumansToxicogenomicsTranscriptomeMolecular BiologyGeneFunction (biology)Cells CulturedMethods (San Diego, Calif.)
researchProduct

Automated selection of homologs to track the evolutionary history of proteins

2018

Background The selection of distant homologs of a query protein under study is a usual and useful application of protein sequence databases. Such sets of homologs are often applied to investigate the function of a protein and the degree to which experimental results can be transferred from one organism to another. In particular, a variety of databases facilitates static browsing for orthologs. However, these resources have a limited power when identifying orthologs between taxonomically distant species. In addition, in some situations, for a given query protein, it is advantageous to compare the sets of orthologs from different specific organisms: this recursive step-wise search might give …

0301 basic medicineProteomeComputer scienceComputational biologyWeb toollcsh:Computer applications to medicine. Medical informaticsBiochemistryHomology (biology)Evolution Molecular03 medical and health sciences0302 clinical medicineProtein sequencingStructural BiologyHomologous chromosomeHumansDatabases ProteinMolecular Biologylcsh:QH301-705.5OrganismProtein functionMethodology ArticleApplied MathematicsProteinsA proteinComputer Science ApplicationsHomologyEvolutionary path030104 developmental biologyComputingMethodologies_PATTERNRECOGNITIONlcsh:Biology (General)Proteomelcsh:R858-859.7DNA microarraySoftware030217 neurology & neurosurgeryBMC Bioinformatics
researchProduct

Statistical guidelines for quality control of next-generation sequencing techniques.

2021

Condition-specific statistical guidelines and accurate classification trees for quality control of functional genomics NGS files (RNA-seq, ChIP-seq and DNase-seq) have been generated using thousands of reference files from the ENCODE project and made available to the community.

Quality ControlComputer scienceHealth Toxicology and Mutagenesismedia_common.quotation_subjectControl (management)genetic processes26Plant ScienceBiochemistry Genetics and Molecular Biology (miscellaneous)HumansQuality (business)Statistical analysisRelevance (information retrieval)natural sciencesResearch Articlesmedia_commonEcologyScope (project management)Genome HumanComputational BiologyHigh-Throughput Nucleotide Sequencing15Sequence Analysis DNA11Data scienceComputingMethodologies_PATTERNRECOGNITIONTheoryofComputation_MATHEMATICALLOGICANDFORMALLANGUAGESSoftwareResearch ArticleLife science alliance
researchProduct

REP2: A Web Server to Detect Common Tandem Repeats in Protein Sequences

2020

Ensembles of tandem repeats (TRs) in protein sequences expand rapidly to form domains well suited for interactions with proteins. For this reason, they are relatively frequent. Some TRs have known structures and therefore it is advantageous to predict their presence in a protein sequence. However, since most TRs diverge quickly, their detection by classical sequence comparison algorithms is not very accurate. Previously, we developed a method and a web server that used curated profiles and thresholds for the detection of 11 common TRs. Here we present a new web server (REP2) that allows the analysis of TRs in both individual and aligned sequences. We provide currently precomputed analyses f…

Repetitive Sequences Amino AcidWeb serverProteomeComputer scienceComputational biologycomputer.software_genreEvolution Molecular03 medical and health sciences0302 clinical medicineTandem repeatStructural BiologySequence comparisonHumansAmino Acid SequenceMolecular BiologyConserved Sequence030304 developmental biologySequence (medicine)Comparative genomicsInternet0303 health sciencesMultiple sequence alignmentBacteriaProteinsTandem Repeat SequencesProteomeUniProtSequence Alignmentcomputer030217 neurology & neurosurgeryJournal of Molecular Biology
researchProduct

Co-regulation of paralog genes in the three-dimensional chromatin architecture.

2016

Paralog genes arise from gene duplication events during evolution, which often lead to similar proteins that cooperate in common pathways and in protein complexes. Consequently, paralogs show correlation in gene expression whereby the mechanisms of co-regulation remain unclear. In eukaryotes, genes are regulated in part by distal enhancer elements through looping interactions with gene promoters. These looping interactions can be measured by genome-wide chromatin conformation capture (Hi-C) experiments, which revealed self-interacting regions called topologically associating domains (TADs). We hypothesize that paralogs share common regulatory mechanisms to enable coordinated expression acco…

0301 basic medicineanimal structuresComputational biologyBiologyGenomeChromosome conformation capture03 medical and health sciencesMice0302 clinical medicineDogsGene DuplicationGene duplicationGeneticsAnimalsCluster AnalysisHumansPromoter Regions GeneticGeneChIA-PETGenomic organizationGeneticsRegulation of gene expressionGenomefungiGene regulation Chromatin and EpigeneticsComputational BiologyChromatin Assembly and DisassemblyBiological EvolutionChromatinChromatin030104 developmental biologyEnhancer Elements GeneticGene Expression Regulation030217 neurology & neurosurgeryNucleic acids research
researchProduct

Proteome-wide comparison between the amino acid composition of domains and linkers

2018

Objective Amino acid composition is a sequence feature that has been extensively used to characterize proteomes of many species and protein families. Yet the analysis of amino acid composition of protein domains and the linkers connecting them has received less attention. Here, we perform both a comprehensive full-proteome amino acid composition analysis and a similar analysis focusing on domains and linkers, to uncover domain- or linker-specific differential amino acid usage patterns. Results The amino acid composition in the 38 proteomes studied showcase the greater variability found in archaea and bacteria species compared to eukaryotes. When focusing on domains and linkers, we describe …

Proteomics570BacteriaProteomeAmino acid compositionlcsh:Rlcsh:MedicineEukaryotaArchaea570 Life sciencesResearch Notelcsh:Biology (General)Sequence Analysis ProteinCatalytic DomainDomainsAmino Acid SequenceLinkerslcsh:Science (General)lcsh:QH301-705.5570 Biowissenschaftenlcsh:Q1-390BMC Research Notes
researchProduct

Tandem repeats lead to sequence assembly errors and impose multi-level challenges for genome and protein databases

2019

AbstractThe widespread occurrence of repetitive stretches of DNA in genomes of organisms across the tree of life imposes fundamental challenges for sequencing, genome assembly, and automated annotation of genes and proteins. This multi-level problem can lead to errors in genome and protein databases that are often not recognized or acknowledged. As a consequence, end users working with sequences with repetitive regions are faced with ‘ready-to-use’ deposited data whose trustworthiness is difficult to determine, let alone to quantify. Here, we provide a review of the problems associated with tandem repeat sequences that originate from different stages during the sequencing-assembly-annotatio…

FOS: Computer and information sciencesBioinformatics[SDV]Life Sciences [q-bio]Sequence assemblyGenomics[SDV.BC]Life Sciences [q-bio]/Cellular BiologyComputational biologyBiologyGenome03 medical and health sciencesAnnotation0302 clinical medicineTandem repeatGeneticsAnimalsSurvey and SummaryDatabases ProteinGeneComputingMilieux_MISCELLANEOUS030304 developmental biology0303 health sciencesEnd user572: BiochemieDNASequence Analysis DNAGenomics[SDV.BIBS]Life Sciences [q-bio]/Quantitative Methods [q-bio.QM]WorkflowComputingMethodologies_PATTERNRECOGNITIONGadus morhuaTandem Repeat SequencesScientific Experimental Error[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]Databases Nucleic Acid030217 neurology & neurosurgery
researchProduct

Single-cell ChIP-seq imputation with SIMPA by leveraging bulk ENCODE data

2019

Abstract Single-cell ChIP-seq analysis is challenging due to data sparsity. We present SIMPA ( https://github.com/salbrec/SIMPA ), a single-cell ChIP-seq data imputation method leveraging predictive information within bulk ENCODE data to impute missing protein-DNA interacting regions of target histone marks or transcription factors. Machine learning models trained for each single cell, each target, and each genomic region enable drastic improvement in cell types clustering and genes identification.

researchProduct

Disentangling the complexity of low complexity proteins

2020

Abstract There are multiple definitions for low complexity regions (LCRs) in protein sequences, with all of them broadly considering LCRs as regions with fewer amino acid types compared to an average composition. Following this view, LCRs can also be defined as regions showing composition bias. In this critical review, we focus on the definition of sequence complexity of LCRs and their connection with structure. We present statistics and methodological approaches that measure low complexity (LC) and related sequence properties. Composition bias is often associated with LC and disorder, but repeats, while compositionally biased, might also induce ordered structures. We illustrate this dichot…

Protein ConformationComputer scienceReview ArticleComputational biologyMeasure (mathematics)Evolution MolecularLow complexity03 medical and health sciencesProtein DomainsAmino Acid Sequencestructure[SDV.BBM.BC]Life Sciences [q-bio]/Biochemistry Molecular Biology/Biochemistry [q-bio.BM]Databases ProteinMolecular Biology030304 developmental biologyStructure (mathematical logic)0303 health sciencesSequence[SCCO.NEUR]Cognitive science/Neurosciencecomposition bias030302 biochemistry & molecular biologyProteinsdisorderlow complexity regionsStructure and function[INFO.INFO-BI]Computer Science [cs]/Bioinformatics [q-bio.QM]AlgorithmsInformation SystemsBriefings in Bioinformatics
researchProduct

Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs

2018

Background: Transcription factors (TFs) bind to gene promoters or distal regulatory elements that interact with the promoter via chromatin looping. While the TF binding sites themselves are detected genome-wide by ChIP-seq experiments, it is difficult to associate them regulated genes without information of chromatin looping. Recent experimental techniques such as Hi-C or ChIA-PET measure long-range interactions genome-wide but are experimentally elaborate and have limited resolution. Here, we present Computational Chromosome Conformation Capture by Correlation of ChIP-seq at CTCF motifs (7C). Results: While ChIP-seq was not designed to detect contacts, the formaldehyde treatment in the ChI…

PhysicsChromosome conformation captureCTCFgenetic processesnatural sciencesHuman genomePromoterComputational biologyBinding siteSequence motifTranscription factorChromatin
researchProduct

Protein Interaction Networks in Health and Disease

2016

protein networkdiseaseEditorialPhysiologysystems biologyprotein functionprotein interactionsFrontiers in Genetics
researchProduct