Search results for "Algorithm"
showing 10 items of 4887 documents
A graphical model selection tool for mixed models
2017
Model selection can be defined as the task of estimating the performance of different models in order to choose the most parsimonious one, among a potentially very large set of candidate statistical models. We propose a graphical representation to be considered as an extension to the class of mixed models of the deviance plot proposed in the literature within the framework of classical and generalized linear models. This graphical representation allows, once a reduced number of models have been selected, to identify important covariates focusing only on the fixed effects component, assuming the random part properly specified. Nevertheless, we suggest also a standalone figure representing th…
LEGO-based generalized set of two linear algebraic 3D bio-macro-molecular descriptors: Theory and validation by QSARs
2019
Abstract Novel 3D protein descriptors based on bilinear, quadratic and linear algebraic maps in R n are proposed. The latter employs the kth 2-tuple (dis) similarity matrix to codify information related to covalent and non-covalent interactions in these biopolymers. The calculation of the inter-amino acid distances is generalized by using several dis-similarity coefficients, where normalization procedures based on the simple stochastic and mutual probability schemes are applied. A new local-fragment approach based on amino acid-types and amino acid-groups is proposed to characterize regions of interest in proteins. Topological and geometric macromolecular cutoffs are defined using local and…
MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems
2016
This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of recordJorge González-Domínguez, Yongchao Liu, Juan Touriño, Bertil Schmidt; MSAProbs-MPI: parallel multiple sequence aligner for distributed-memory systems, Bioinformatics, Volume 32, Issue 24, 15 December 2016, Pages 3826–3828, https://doi.org/10.1093/bioinformatics/btw558is available online at: https://doi.org/10.1093/bioinformatics/btw558 [Abstracts] MSAProbs is a state-of-the-art protein multiple sequence alignment tool based on hidden Markov models. It can achieve high alignment accuracy at the expense of relatively long runtimes for large-sca…
Parallel and Space-Efficient Construction of Burrows-Wheeler Transform and Suffix Array for Big Genome Data
2016
Next-generation sequencing technologies have led to the sequencing of more and more genomes, propelling related research into the era of big data. In this paper, we present ParaBWT, a parallelized Burrows-Wheeler transform (BWT) and suffix array construction algorithm for big genome data. In ParaBWT, we have investigated a progressive construction approach to constructing the BWT of single genome sequences in linear space complexity, but with a small constant factor. This approach has been further parallelized using multi-threading based on a master-slave coprocessing model. After gaining the BWT, the suffix array is constructed in a memory-efficient manner. The performance of ParaBWT has b…
Deep learning models for bacteria taxonomic classification of metagenomic data.
2018
Background An open challenge in translational bioinformatics is the analysis of sequenced metagenomes from various environmental samples. Of course, several studies demonstrated the 16S ribosomal RNA could be considered as a barcode for bacteria classification at the genus level, but till now it is hard to identify the correct composition of metagenomic data from RNA-seq short-read data. 16S short-read data are generated using two next generation sequencing technologies, i.e. whole genome shotgun (WGS) and amplicon (AMP); typically, the former is filtered to obtain short-reads belonging to a 16S shotgun (SG), whereas the latter take into account only some specific 16S hypervariable regions.…
Identification of transcribed protein coding sequence remnants within lincRNAs
2018
Abstract Long intergenic non-coding RNAs (lincRNAs) are non-coding transcripts >200 nucleotides long that do not overlap protein-coding sequences. Importantly, such elements are known to be tissue-specifically expressed and to play a widespread role in gene regulation across thousands of genomic loci. However, very little is known of the mechanisms for the evolutionary biogenesis of these RNA elements, especially given their poor conservation across species. It has been proposed that lincRNAs might arise from pseudogenes. To test this systematically, we developed a novel method that searches for remnants of protein-coding sequences within lincRNA transcripts; the hypothesis is that we can t…
mD3DOCKxb: An Ultra-Scalable CPU-MIC Coordinated Virtual Screening Framework
2017
Molecular docking is an important method in computational drug discovery. In large-scale virtual screening, millions of small drug-like molecules (chemical compounds) are compared against a designated target protein (receptor). Depending on the utilized docking algorithm for screening, this can take several weeks on conventional HPC systems. However, for certain applications including large-scale screening tasks for newly emerging infectious diseases such high runtimes can be highly prohibitive. In this paper, we investigate how the massively parallel neo-heterogeneous architecture of Tianhe-2 Supercomputer consisting of thousands of nodes comprising CPUs and MIC coprocessors that can effic…
Parallel algorithms for large-scale biological sequence alignment on Xeon-Phi based clusters
2016
Computing alignments between two or more sequences are common operations frequently performed in computational molecular biology. The continuing growth of biological sequence databases establishes the need for their efficient parallel implementation on modern accelerators. This paper presents new approaches to high performance biological sequence database scanning with the Smith-Waterman algorithm and the first stage of progressive multiple sequence alignment based on the ClustalW heuristic on a Xeon Phi-based compute cluster. Our approach uses a three-level parallelization scheme to take full advantage of the compute power available on this type of architecture; i.e. cluster-level data par…
The use of morphokinetic as a predictor of implantation.
2017
In recent years the increased efforts intended for improving future outcomes in the laboratory have focused mostly on the search of additional markers of embryo quality to add up present embryo selection criteria. Time-lapse system involves an alternative tool in assisted reproduction techniques, being able to improve the embryo selection from a dynamic and interactive approach while standard embryo assessment implies a subjective and static morphology evaluation and consequently reducing the information gained for embryo selection, time-lapse technology adds several morphokinetic parameters, providing additional input for embryo evaluation. This further information represents a challenge f…
Quantitative Assessment of Eye Phenotypes for Functional Genetic Studies Using Drosophila melanogaster
2016
AbstractAbout two-thirds of the vital genes in the Drosophila genome are involved in eye development, making the fly eye an excellent genetic system to study cellular function and development, neurodevelopment/degeneration, and complex diseases such as cancer and diabetes. We developed a novel computational method, implemented as Flynotyper software (http://flynotyper.sourceforge.net), to quantitatively assess the morphological defects in the Drosophila eye resulting from genetic alterations affecting basic cellular and developmental processes. Flynotyper utilizes a series of image processing operations to automatically detect the fly eye and the individual ommatidium, and calculates a phen…