Search results for "ALIGNMENT"

showing 10 items of 627 documents

Efficient Algorithms for Sequence Analysis with Entropic Profiles

2017

Entropy, being closely related to repetitiveness and compressibility, is a widely used information-related measure to assess the degree of predictability of a sequence. Entropic profiles are based on information theory principles, and can be used to study the under-/over-representation of subwords, by also providing information about the scale of conserved DNA regions. Here, we focus on the algorithmic aspects related to entropic profiles. In particular, we propose linear time algorithms for their computation that rely on suffix-based data structures, more specifically on the truncated suffix tree (TST) and on the enhanced suffix array (ESA). We performed an extensive experimental campaign …

0301 basic medicineCompressed suffix arrayTheoretical computer scienceEntropySuffix tree0206 medical engineeringGeneralized suffix tree02 engineering and technologyString searching algorithmInformation theorylaw.invention03 medical and health scienceslawGeneticsAnimalsHumansMathematicsApplied MathematicsSuffix arrayComputational BiologyDNASequence Analysis DNAData structure030104 developmental biologySuffixAlignment free Entropy Sequence analysis Sequence comparisonAlgorithms020602 bioinformaticsBiotechnologyIEEE/ACM Transactions on Computational Biology and Bioinformatics
researchProduct

A new parallel pipeline for DNA methylation analysis of long reads datasets

2017

Background DNA methylation is an important mechanism of epigenetic regulation in development and disease. New generation sequencers allow genome-wide measurements of the methylation status by reading short stretches of the DNA sequence (Methyl-seq). Several software tools for methylation analysis have been proposed over recent years. However, the current trend is that the new sequencers and the ones expected for an upcoming future yield sequences of increasing length, making these software tools inefficient and obsolete. Results In this paper, we propose a new software based on a strategy for methylation analysis of Methyl-seq sequencing data that requires much shorter execution times while…

0301 basic medicineComputer scienceParallel pipelineADN02 engineering and technologycomputer.software_genreBiochemistrySensitivity and SpecificityDNA sequencingEpigenesis Genetic03 medical and health scienceschemistry.chemical_compoundStructural BiologyRNA analysisInformàticaDatabases Genetic0202 electrical engineering electronic engineering information engineeringHumansEpigeneticsMolecular Biology020203 distributed computingDNA methylationGenome HumanApplied MathematicsParallel pipelineMethylationSequence Analysis DNASupercomputerComputer Science ApplicationsGenòmica030104 developmental biologychemistryGene Expression RegulationDNA methylationMutationData miningHigh performance computingDNA microarraycomputerSequence AlignmentDNASoftware
researchProduct

Diversification of spatiotemporal expression and copy number variation of the echinoid hbox12/pmar1/micro1 multigene family

2017

Changes occurring during evolution in the cis-regulatory landscapes of individual members of multigene families might impart diversification in their spatiotemporal expression and function. The archetypal member of the echinoid hbox12/pmar1/micro1 family is hbox12-a, a homeobox-containing gene expressed exclusively by dorsal blastomeres, where it governs the dorsal/ventral gene regulatory network during embryogenesis of the sea urchin Paracentrotus lividus. Here we describe the inventory of the hbox12/pmar1/micro1 genes in P. lividus, highlighting that gene copy number variation occurs across individual sea urchins of the same species. We show that the various hbox12/pmar1/micro1 genes grou…

0301 basic medicineEvolutionary GeneticsEmbryologyGene regulatory networklcsh:MedicineGene ExpressionMedicine (all); Biochemistry Genetics and Molecular Biology (all); Agricultural and Biological Sciences (all)Database and Informatics MethodsGene duplicationGene Regulatory NetworksCopy-number variationlcsh:ScienceSea urchinPhylogenyMultidisciplinarybiologyPhylogenetic treeMedicine (all)Genes HomeoboxGene Expression Regulation DevelopmentalAnimal ModelsGenomicsExperimental Organism SystemsMultigene FamilySequence AnalysisResearch ArticleEchinodermsDNA Copy Number VariationsBioinformaticsDNA transcriptionZoologySettore BIO/11 - Biologia MolecolareResearch and Analysis MethodsParacentrotus lividus03 medical and health sciencesSequence Motif Analysisbiology.animalGeneticsGene familyAnimalsGeneEvolutionary BiologyBiochemistry Genetics and Molecular Biology (all)lcsh:REmbryosOrganismsBiology and Life SciencesComputational Biologybiology.organism_classificationGenome AnalysisGenomic LibrariesInvertebrates030104 developmental biologyAgricultural and Biological Sciences (all)Evolutionary biologySea Urchinslcsh:QSequence AlignmentDevelopmental Biology
researchProduct

The colored longest common prefix array computed via sequential scans

2018

Due to the increased availability of large datasets of biological sequences, the tools for sequence comparison are now relying on efficient alignment-free approaches to a greater extent. Most of the alignment-free approaches require the computation of statistics of the sequences in the dataset. Such computations become impractical in internal memory when very large collections of long sequences are considered. In this paper, we present a new conceptual data structure, the colored longest common prefix array (cLCP), that allows to efficiently tackle several problems with an alignment-free approach. In fact, we show that such a data structure can be computed via sequential scans in semi-exter…

0301 basic medicineFOS: Computer and information sciencesAlignment-free methodsBurrows–Wheeler transformComputer scienceComputationAverage common substring0206 medical engineeringMatching statisticsScale (descriptive set theory)02 engineering and technologyTheoretical Computer Science03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Burrows-wheeler transformString (computer science)Computer Science (all)LCP arrayMatching statisticData structureSubstring030104 developmental biologyAlignment-free methods; Average common substring; Burrows-wheeler transform; Longest common prefix; Matching statistics; Theoretical Computer Science; Computer Science (all)Pairwise comparisonLongest common prefixAlgorithm020602 bioinformaticsAlignment-free method
researchProduct

Alignment-free sequence comparison using absent words

2018

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realised by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as $q$-gram distance, are usually computed in time linear with respect to the length of the sequences. In this paper, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an {\em absent word} of some sequence if it does not oc…

0301 basic medicineFOS: Computer and information sciencesFormal Languages and Automata Theory (cs.FL)Computer Science - Formal Languages and Automata TheorySequence alignmentInformation System0102 computer and information sciencesCircular wordAbsent words01 natural sciencesUpper and lower boundsSequence comparisonTheoretical Computer ScienceCombinatorics03 medical and health sciencesComputer Science - Data Structures and AlgorithmsData Structures and Algorithms (cs.DS)Absent wordCircular wordsMathematicsSequenceSettore INF/01 - InformaticaProcess (computing)q-gramComputer Science Applications1707 Computer Vision and Pattern Recognitionq-gramsComposition (combinatorics)Computer Science Applications030104 developmental biologyComputational Theory and MathematicsForbidden words010201 computation theory & mathematicsFocus (optics)Forbidden wordWord (computer architecture)Information SystemsInteger (computer science)
researchProduct

Integrative analysis of structural variations using short-reads and linked-reads yields highly specific and sensitive predictions.

2020

Genetic diseases are driven by aberrations of the human genome. Identification of such aberrations including structural variations (SVs) is key to our understanding. Conventional short-reads whole genome sequencing (cWGS) can identify SVs to base-pair resolution, but utilizes only short-range information and suffers from high false discovery rate (FDR). Linked-reads sequencing (10XWGS) utilizes long-range information by linkage of short-reads originating from the same large DNA molecule. This can mitigate alignment-based artefacts especially in repetitive regions and should enable better prediction of SVs. However, an unbiased evaluation of this technology is not available. In this study, w…

0301 basic medicineFalse discovery rateComputer scienceArtificial Gene Amplification and ExtensionPolymerase Chain ReactionDatabase and Informatics MethodsSequencing techniques0302 clinical medicineBreast TumorsBasic Cancer ResearchMedicine and Health SciencesDNA sequencingBiology (General)EcologyHigh-Throughput Nucleotide SequencingGenomicsDNA Neoplasm3. Good healthIdentification (information)OncologyComputational Theory and MathematicsModeling and SimulationMCF-7 CellsFemaleSequence AnalysisResearch ArticleBioinformaticsQH301-705.5Breast NeoplasmsGenomicsComputational biologyResearch and Analysis MethodsHuman Genomics03 medical and health sciencesCellular and Molecular NeuroscienceCancer GenomicsGenomic MedicineBreast CancerGeneticsDNA Barcoding TaxonomicHumansMolecular Biology TechniquesMolecular BiologyEcology Evolution Behavior and SystematicsWhole genome sequencingLinkage (software)Whole Genome SequencingGenome HumanDideoxy DNA sequencingGenetic Diseases InbornCancers and NeoplasmsBiology and Life SciencesComputational BiologyStatistical modelSequence Analysis DNARepetitive RegionsLogistic Models030104 developmental biologyGenomic Structural VariationHuman genomeSequence Alignment030217 neurology & neurosurgeryPLoS Computational Biology
researchProduct

Complexity of gap junctions between horizontal cells of the carp retina.

2016

In the vertebrate retina, horizontal cells (HCs) reveal homologous coupling by gap junctions (gj), which are thought to consist of different connexins (Cx). However, recent studies in mouse, rabbit and zebrafish retina indicate that individual HCs express more than one connexin. To provide further insights into the composition of gj connecting HCs and to determine whether HCs express multiple connexins, we examined the molecular identity and distribution of gj between HCs of the carp retina. We have cloned four carp connexins designated Cx49.5, Cx55.5, Cx52.6 and Cx53.8 with a close relationship to connexins previously reported in HCs of mouse, rabbit and zebrafish, respectively. Using in s…

0301 basic medicineFish ProteinsCarpsImmunoelectron microscopyBlotting WesternConnexinIn situ hybridizationRetinal Horizontal Cellsbehavioral disciplines and activitiesPolymerase Chain ReactionConnexins03 medical and health sciencesMice0302 clinical medicineCell Line TumormedicineAnimalsProtein IsoformsElectrical synapseAmino Acid SequenceCarpMicroscopy ImmunoelectronZebrafishIn Situ HybridizationRetinabiologyGeneral NeuroscienceGap junctionGap JunctionsAnatomyDendritesbiology.organism_classificationImmunohistochemistryAxonsCell biology030104 developmental biologymedicine.anatomical_structureembryonic structuressense organsSequence Alignment030217 neurology & neurosurgeryNeuroscience
researchProduct

Identification of a classic nuclear localization signal at the N terminus that regulates the subcellular localization of Rbfox2 isoforms during diffe…

2016

Nuclear localization of the alternative splicing factor Rbfox2 is achieved by a C-terminal nuclear localization signal (NLS) which can be excluded from some Rbfox2 isoforms by alternative splicing. While this predicts nuclear and cytoplasmic localization, Rbfox2 is exclusively nuclear in some cell types. Here, we identify a second NLS in the N terminus of Rbfox2 isoform 1A that is not included in Rbfox2 isoform 1F. Rbfox2 1A isoforms lacking the C-terminal NLS are nuclear, whereas equivalent 1F isoforms are cytoplasmic. A shift in Rbfox2 expression toward cytoplasmic 1F isoforms occurs during epithelial to mesenchymal transition (EMT) and could be important in regulating the activity and fu…

0301 basic medicineGene isoformCytoplasmEpithelial-Mesenchymal TransitionNuclear Localization SignalsBiophysicsBiochemistryCell LineTransforming Growth Factor beta103 medical and health sciencesMiceMammary Glands AnimalProtein DomainsStructural BiologyCell Line TumorGeneticsNLSAnimalsProtein IsoformsAmino Acid SequenceMolecular BiologyCell NucleusChemistryAlternative splicingCell DifferentiationEpithelial CellsMouse Embryonic Stem CellsCell BiologySubcellular localizationMolecular biologyCell biologyAlternative Splicing030104 developmental biologyP19 cellCytoplasmRNA splicingRNA Splicing FactorsSequence AlignmentNuclear localization sequenceSignal TransductionFEBS letters
researchProduct

Genetic Diversity of O-Antigens in Hafnia alvei and the Development of a Suspension Array for Serotype Detection.

2016

Hafnia alvei is a facultative and rod-shaped gram-negative bacterium that belongs to the Enterobacteriaceae family. Although it has been more than 50 years since the genus was identified, very little is known about variations among Hafnia species. Diversity in O-antigens (O-polysaccharide, OPS) is thought to be a major factor in bacterial adaptation to different hosts and situations and variability in the environment. Antigenic variation is also an important factor in pathogenicity that has been used to define clones within a number of species. The genes that are required to synthesize OPS are always clustered within the bacterial chromosome. A serotyping scheme including 39 O-serotypes has…

0301 basic medicineGlycobiologylcsh:MedicineArtificial Gene Amplification and ExtensionGenomePolymerase Chain ReactionBiochemistryDatabase and Informatics MethodsNucleic AcidsGene clusterlcsh:SciencePhylogenyGeneticsMultidisciplinaryChromosome BiologyPolysaccharides BacterialO AntigensEnzymesMultigene FamilySequence AnalysisResearch ArticleDNA Bacterial030106 microbiologySequence DatabasesBiologyResearch and Analysis MethodsSensitivity and SpecificityChromosomesBacterial genetics03 medical and health sciencesTransferasesSequence Motif AnalysisPolysaccharidesGenetic variationAntigenic variationGeneticsSerotypingMolecular Biology TechniquesSequencing TechniquesOperonsGeneMolecular BiologyGenetic diversityCircular bacterial chromosomelcsh:RGenetic VariationReproducibility of ResultsBiology and Life SciencesProteinsHafnia alveiCell BiologyDNABiosynthetic Pathways030104 developmental biologyBiological DatabasesEnzymologylcsh:QSequence AlignmentGenome BacterialPLoS ONE
researchProduct

Linear-time sequence comparison using minimal absent words & applications

2016

Sequence comparison is a prerequisite to virtually all comparative genomic analyses. It is often realized by sequence alignment techniques, which are computationally expensive. This has led to increased research into alignment-free techniques, which are based on measures referring to the composition of sequences in terms of their constituent patterns. These measures, such as q-gram distance, are usually computed in time linear with respect to the length of the sequences. In this article, we focus on the complementary idea: how two sequences can be efficiently compared based on information that does not occur in the sequences. A word is an absent word of some sequence if it does not occur in…

0301 basic medicineLatin AmericansComputer Science (all)Library science0102 computer and information sciencesCircular wordAlgorithms on string01 natural sciencesAlignmentfree comparisonSequence comparisonTheoretical Computer Science03 medical and health sciences030104 developmental biology010201 computation theory & mathematicsInformaticsPolitical scienceAbsent wordForbidden word
researchProduct