Search results for "Sequence Alignment"
showing 10 items of 447 documents
Assessment of the probabilities for evolutionary structural changes in protein folds.
2007
Abstract Motivation: The evolution of protein sequences can be described by a stepwise process, where each step involves changes of a few amino acids. In a similar manner, the evolution of protein folds can be at least partially described by an analogous process, where each step involves comparatively simple changes affecting few secondary structure elements. A number of such evolution steps, justified by biologically confirmed examples, have previously been proposed by other researchers. However, unlike the situation with sequences, as far as we know there have been no attempts to estimate the comparative probabilities for different kinds of such structural changes. Results: We have tried …
CARE: context-aware sequencing read error correction.
2020
Abstract Motivation Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. Results We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors ar…
Long read alignment based on maximal exact match seeds
2012
Abstract Motivation: The explosive growth of next-generation sequencing datasets poses a challenge to the mapping of reads to reference genomes in terms of alignment quality and execution speed. With the continuing progress of high-throughput sequencing technologies, read length is constantly increasing and many existing aligners are becoming inefficient as generated reads grow larger. Results: We present CUSHAW2, a parallelized, accurate, and memory-efficient long read aligner. Our aligner is based on the seed-and-extend approach and uses maximal exact matches as seeds to find gapped alignments. We have evaluated and compared CUSHAW2 to the three other long read aligners BWA-SW, Bowtie2 an…
kmcEx: memory-frugal and retrieval-efficient encoding of counted k-mers.
2018
Abstract Motivation K-mers along with their frequency have served as an elementary building block for error correction, repeat detection, multiple sequence alignment, genome assembly, etc., attracting intensive studies in k-mer counting. However, the output of k-mer counters itself is large; very often, it is too large to fit into main memory, leading to highly narrowed usability. Results We introduce a novel idea of encoding k-mers as well as their frequency, achieving good memory saving and retrieval efficiency. Specifically, we propose a Bloom filter-like data structure to encode counted k-mers by coupled-bit arrays—one for k-mer representation and the other for frequency encoding. Exper…
Attracted or repelled?--a matter of two neurons, one pheromone binding protein, and a chiral center.
1998
Abstract Two species of scarab beetles, the Osaka beetle (Anomala osakana) and the Japanese beetle (Popillia japonica), utilize the opposite enantiomers of japonilure, (Z)-5-(1-decenyl)oxacyclopentan-2-one, as their sex pheromones. Each species produces only one of the enantiomers that functions as its own sex pheromone and as a very strong behavioral antagonist for the other species. Using an integrated approach we tested whether the discrimination of these two opposite signals is due to selective filtering by pheromone binding proteins or whether it originates in the specificity of ligand–receptor interactions. We found that the antennae of each of these two scarab species contain only a …
Engineering of chicken avidin: a progressive series of reduced charge mutants.
1998
Avidin, a positively charged egg-white glycoprotein, is a widely used tool in biotechnological applications because of its ability to bind biotin strongly. The high pI of avidin (approximately 10.5), however, is a hindrance in certain applications due to non-specific (charge-related) binding. Here we report a construction of a series of avidin charge mutants with pIs ranging from 9.4 to 4.7. Rational design of the avidin mutants was based on known crystallographic data together with comparative sequence alignment of avidin, streptavidin and a set of avidin-related genes which occur in the chicken genome. All charge mutants retained the ability to bind biotin tightly according to optical bio…
Evolutionary relationships among the members of an ancient class of non-LTR retrotransposons found in the nematode Caenorhabditis elegans.
1998
We took advantage of the massive amount of sequence information generated by the Caenorhabditis elegans genome project to perform a comprehensive analysis of a group of over 100 related sequences that has allowed us to describe two new C. elegans non-LTR retrotransposons. We named them Sam and Frodo. We also determined that several highly divergent subfamilies of both elements exist in C. elegans. It is likely that several master copies have been active at the same time in C. elegans, although only a few copies of both Sam and Frodo have characteristics that are compatible with them being active today. We discuss whether it is more appropriate under these circumstances to define only 2 elem…
Phylogenetic analysis of the thiolase family. Implications for the evolutionary origin of peroxisomes
1992
The thiolase family is a widespread group of proteins present in prokaryotes and three cellular compartments of eukaryotes. This fact makes this family interesting in order to study the evolutionary process of eukaryotes. Using the sequence of peroxisomal thiolase from Saccharomyces cerevisiae recently obtained by us and the other known thiolase sequences, a phylogenetic analysis has been carried out. It shows that all these proteins derived from a primitive enzyme, present in the common ancestor of eubacteria and eukaryotes, which evolved into different specialized thiolases confined to various cell compartments. The evolutionary tree obtained is compatible with the endosymbiotic theory fo…
Molecules and morphology reveal cryptic variation among digeneans infecting sympatric mullets in the Mediterranean.
2009
SUMMARYWe applied a combined molecular and morphological approach to resolve the taxonomic status of Saccocoelium spp. parasitizing sympatric mullets (Mugilidae) in the Mediterranean. Eight morphotypes of Saccocoelium were distinguished by means of multivariate statistical analyses: 2 of Saccocoelium obesum ex Liza spp.; 4 of S. tensum ex Liza spp.; and 2 (S. cephali and Saccocoelium sp.) ex Mugil cephalus. Sequences of the 28S and ITS2 rRNA gene regions were obtained for a total of 21 isolates of these morphotypes. Combining sequence data analysis with a detailed morphological and multivariate morphometric study of the specimens allowed the demonstration of cryptic diversity thus rejecting…
One is not enough: On the effects of reference genome for the mapping and subsequent analyses of short-reads.
2020
Mapping of high-throughput sequencing (HTS) reads to a single arbitrary reference genome is a frequently used approach in microbial genomics. However, the choice of a reference may represent a source of errors that may affect subsequent analyses such as the detection of single nucleotide polymorphisms (SNPs) and phylogenetic inference. In this work, we evaluated the effect of reference choice on short-read sequence data from five clinically and epidemiologically relevant bacteria (Klebsiella pneumoniae, Legionella pneumophila, Neisseria gonorrhoeae, Pseudomonas aeruginosa and Serratia marcescens). Publicly available whole-genome assemblies encompassing the genomic diversity of these species…