Search results for "Sequence alignment"

showing 10 items of 447 documents

Assessment of the probabilities for evolutionary structural changes in protein folds.

2007

Abstract Motivation: The evolution of protein sequences can be described by a stepwise process, where each step involves changes of a few amino acids. In a similar manner, the evolution of protein folds can be at least partially described by an analogous process, where each step involves comparatively simple changes affecting few secondary structure elements. A number of such evolution steps, justified by biologically confirmed examples, have previously been proposed by other researchers. However, unlike the situation with sequences, as far as we know there have been no attempts to estimate the comparative probabilities for different kinds of such structural changes. Results: We have tried …

Statistics and ProbabilityModels MolecularProtein FoldingProtein domainStructural alignmentBiologyBiochemistrySet (abstract data type)Evolution MolecularProtein structureSimilarity (network science)Sequence Analysis ProteinComputer SimulationMolecular BiologyProtein secondary structureConserved SequenceSequenceModels GeneticSequence Homology Amino AcidProteinsStructural Classification of Proteins databaseComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsModels ChemicalData Interpretation Statisticalsense organsAlgorithmSequence AlignmentBioinformatics (Oxford, England)
researchProduct

CARE: context-aware sequencing read error correction.

2020

Abstract Motivation Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. Results We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors ar…

Statistics and ProbabilityMultiple sequence alignmentComputer scienceSequence assemblyHigh-Throughput Nucleotide SequencingContext (language use)Sequence Analysis DNAcomputer.software_genreBiochemistryGenomeComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsHumansHuman genomeData miningError detection and correctionMolecular BiologycomputerSequence AlignmentAlgorithmsSoftwareBioinformatics (Oxford, England)
researchProduct

Long read alignment based on maximal exact match seeds

2012

Abstract Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 an…

Statistics and ProbabilitySequencing and Sequence AnalysisTheoretical computer scienceGenomicsBiologyBiochemistrySoftwareHumansMolecular BiologyAlignment-free sequence analysisExact matchSupplementary dataGenome Humanbusiness.industryChromosome MappingHigh-Throughput Nucleotide SequencingGenomicsSequence Analysis DNAOriginal PapersComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsComputer engineeringScalabilitybusinessSequence AlignmentAlgorithmsSoftwareBioinformatics
researchProduct

kmcEx: memory-frugal and retrieval-efficient encoding of counted k-mers.

2018

Abstract Motivation K-mers along with their frequency have served as an elementary building block for error correction, repeat detection, multiple sequence alignment, genome assembly, etc., attracting intensive studies in k-mer counting. However, the output of k-mer counters itself is large; very often, it is too large to fit into main memory, leading to highly narrowed usability. Results We introduce a novel idea of encoding k-mers as well as their frequency, achieving good memory saving and retrieval efficiency. Specifically, we propose a Bloom filter-like data structure to encode counted k-mers by coupled-bit arrays—one for k-mer representation and the other for frequency encoding. Exper…

Statistics and ProbabilitySource codeComputer sciencemedia_common.quotation_subject0206 medical engineeringHash function02 engineering and technologyBiochemistry03 medical and health sciencesEncoding (memory)Molecular BiologyTime complexity030304 developmental biologyBlock (data storage)media_common0303 health sciencesSequence Analysis DNAData structureComputer Science ApplicationsComputational MathematicsComputational Theory and MathematicsError detection and correctionAlgorithmSequence Alignment020602 bioinformaticsAlgorithmsSoftwareBioinformatics (Oxford, England)
researchProduct

Attracted or repelled?--a matter of two neurons, one pheromone binding protein, and a chiral center.

1998

Abstract Two species of scarab beetles, the Osaka beetle (Anomala osakana) and the Japanese beetle (Popillia japonica), utilize the opposite enantiomers of japonilure, (Z)-5-(1-decenyl)oxacyclopentan-2-one, as their sex pheromones. Each species produces only one of the enantiomers that functions as its own sex pheromone and as a very strong behavioral antagonist for the other species. Using an integrated approach we tested whether the discrimination of these two opposite signals is due to selective filtering by pheromone binding proteins or whether it originates in the specificity of ligand–receptor interactions. We found that the antennae of each of these two scarab species contain only a …

StereochemistryProtein ConformationMolecular Sequence DataBiophysicsBiochemistryPheromonesPopilliaBotanymedicineAnimalsPheromone bindingAmino Acid SequenceCloning MolecularMolecular BiologySensillumNeuronsOlfactory receptorBinding SitesbiologyStereoisomerismCell Biologybiology.organism_classificationChemoreceptor CellsColeopteramedicine.anatomical_structureSex pheromonePheromoneEnantiomerPheromone binding proteinSequence AlignmentSignal TransductionBiochemical and biophysical research communications
researchProduct

Engineering of chicken avidin: a progressive series of reduced charge mutants.

1998

Avidin, a positively charged egg-white glycoprotein, is a widely used tool in biotechnological applications because of its ability to bind biotin strongly. The high pI of avidin (approximately 10.5), however, is a hindrance in certain applications due to non-specific (charge-related) binding. Here we report a construction of a series of avidin charge mutants with pIs ranging from 9.4 to 4.7. Rational design of the avidin mutants was based on known crystallographic data together with comparative sequence alignment of avidin, streptavidin and a set of avidin-related genes which occur in the chicken genome. All charge mutants retained the ability to bind biotin tightly according to optical bio…

StreptavidinDNA ComplementaryHot TemperatureMutantBiophysicsBiotinSequence alignmentBiologySpodopteraProtein EngineeringBiochemistrychemistry.chemical_compoundstomatognathic systemBiotinStructural BiologyGeneticsAnimalsMolecular BiologyCharge mutantAvidin-biotin technologyRational designCell BiologyProtein engineeringrespiratory systemAvidinDNA-Binding ProteinschemistryBiochemistryBiotinylationbiology.proteinMutagenesis Site-DirectedChickensAvidinFEBS letters
researchProduct

Evolutionary relationships among the members of an ancient class of non-LTR retrotransposons found in the nematode Caenorhabditis elegans.

1998

We took advantage of the massive amount of sequence information generated by the Caenorhabditis elegans genome project to perform a comprehensive analysis of a group of over 100 related sequences that has allowed us to describe two new C. elegans non-LTR retrotransposons. We named them Sam and Frodo. We also determined that several highly divergent subfamilies of both elements exist in C. elegans. It is likely that several master copies have been active at the same time in C. elegans, although only a few copies of both Sam and Frodo have characteristics that are compatible with them being active today. We discuss whether it is more appropriate under these circumstances to define only 2 elem…

SubfamilyGene Transfer HorizontalRetroelementsMolecular Sequence DataGene DosageRetrotransposonClass (philosophy)BiologyGenomeEvolution MolecularMonophylyOpen Reading FramesGeneticsAnimalsAmino Acid SequenceCaenorhabditis elegansCaenorhabditis elegans ProteinsMolecular BiologyEcology Evolution Behavior and SystematicsCaenorhabditis elegansPhylogenySequence (medicine)GeneticsGenomeComputational BiologyRNA-Directed DNA PolymeraseGenome projectDNA Helminthbiology.organism_classificationEndonucleasesLong Interspersed Nucleotide ElementsEvolutionary biologyMultigene FamilyNucleic Acid ConformationSequence AlignmentMolecular biology and evolution
researchProduct

Phylogenetic analysis of the thiolase family. Implications for the evolutionary origin of peroxisomes

1992

The thiolase family is a widespread group of proteins present in prokaryotes and three cellular compartments of eukaryotes. This fact makes this family interesting in order to study the evolutionary process of eukaryotes. Using the sequence of peroxisomal thiolase from Saccharomyces cerevisiae recently obtained by us and the other known thiolase sequences, a phylogenetic analysis has been carried out. It shows that all these proteins derived from a primitive enzyme, present in the common ancestor of eubacteria and eukaryotes, which evolved into different specialized thiolases confined to various cell compartments. The evolutionary tree obtained is compatible with the endosymbiotic theory fo…

SymbiogenesisMolecular Sequence DataSequence alignmentSaccharomyces cerevisiaeBiologyMicrobodiesHomology (biology)PhylogeneticsMolecular evolutionGeneticsAmino Acid SequenceAcetyl-CoA C-AcetyltransferaseSymbiosisThiolaseMolecular BiologyGenePhylogenyEcology Evolution Behavior and SystematicsGeneticsPhylogenetic treeThiolasePeroxisome evolutionBiological EvolutionEvolutionary biologyBootstrap analysisSequence Alignment
researchProduct

Molecules and morphology reveal cryptic variation among digeneans infecting sympatric mullets in the Mediterranean.

2009

SUMMARYWe applied a combined molecular and morphological approach to resolve the taxonomic status of Saccocoelium spp. parasitizing sympatric mullets (Mugilidae) in the Mediterranean. Eight morphotypes of Saccocoelium were distinguished by means of multivariate statistical analyses: 2 of Saccocoelium obesum ex Liza spp.; 4 of S. tensum ex Liza spp.; and 2 (S. cephali and Saccocoelium sp.) ex Mugil cephalus. Sequences of the 28S and ITS2 rRNA gene regions were obtained for a total of 21 isolates of these morphotypes. Combining sequence data analysis with a detailed morphological and multivariate morphometric study of the specimens allowed the demonstration of cryptic diversity thus rejecting…

SympatrySpecies complexMolecular Sequence DataZoologyTrematode InfectionsFish DiseasesSpecies SpecificityGenetic variationDNA Ribosomal SpacerRNA Ribosomal 28SMediterranean SeaAnimalsRibosomal DNAPhylogenyGenetic diversitybiologyMugilGenetic VariationSequence Analysis DNADNA Helminthbiology.organism_classificationSmegmamorphaGenetic divergenceInfectious DiseasesSympatric speciationAnimal Science and ZoologyParasitologyTrematodaSequence AlignmentParasitology
researchProduct

One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads.

2020

Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species…

Systematic errorSingle Nucleotide PolymorphismsPathology and Laboratory MedicineGenomeKlebsiella PneumoniaeDatabase and Informatics MethodsData sequencesKlebsiellaMedicine and Health SciencesBiology (General)CladePhylogenyData ManagementEcologyPhylogenetic treeBacterial GenomicsMicrobial GeneticsChromosome MappingHigh-Throughput Nucleotide SequencingPhylogenetic AnalysisGenomicsBacterial PathogensPhylogeneticsLegionella PneumophilaComputational Theory and MathematicsMedical MicrobiologyModeling and SimulationPathogensSequence AnalysisResearch ArticleComputer and Information SciencesBioinformaticsQH301-705.5LegionellaSequence alignmentSingle-nucleotide polymorphismGenomicsComputational biologyMicrobial GenomicsBiologyResearch and Analysis MethodsPolymorphism Single NucleotideMicrobiologyCellular and Molecular NeurosciencePhylogeneticsGeneticsSNPBacterial GeneticsEvolutionary SystematicsMolecular BiologyMicrobial PathogensEcology Evolution Behavior and SystematicsTaxonomyEvolutionary BiologyBacteriaOrganismsBiology and Life SciencesBacteriologySequence AlignmentGenome BacterialReference genomePLoS Computational Biology
researchProduct