Search results for "algorithm."
showing 10 items of 4617 documents
A Simple Method to Predict Blood-Brain Barrier Permeability of Drug- Like Compounds Using Classification Trees
2017
Background: To know the ability of a compound to penetrate the blood-brain barrier (BBB) is a challenging task; despite the numerous efforts realized to predict/measure BBB passage, they still have several drawbacks. Methods: The prediction of the permeability through the BBB is carried out using classification trees. A large data set of 497 compounds (recently published) is selected to develop the tree model. Results: The best model shows an accuracy higher than 87.6% for training set; the model was also validated using 10-fold cross-validation procedure and through a test set achieving accuracy values of 86.1% and 87.9%, correspondingly. We give a brief explanation, in structural terms, o…
Epigenetic Control of Phenotypic Plasticity in the Filamentous Fungus Neurospora crassa
2016
Abstract Phenotypic plasticity is the ability of a genotype to produce different phenotypes under different environmental or developmental conditions. Phenotypic plasticity is a ubiquitous feature of living organisms, and is typically based on variable patterns of gene expression. However, the mechanisms by which gene expression is influenced and regulated during plastic responses are poorly understood in most organisms. While modifications to DNA and histone proteins have been implicated as likely candidates for generating and regulating phenotypic plasticity, specific details of each modification and its mode of operation have remained largely unknown. In this study, we investigated how e…
A Methodological Framework to Discover Pharmacogenomic Interactions Based on Random Forests
2021
The identification of genomic alterations in tumor tissues, including somatic mutations, deletions, and gene amplifications, produces large amounts of data, which can be correlated with a diversity of therapeutic responses. We aimed to provide a methodological framework to discover pharmacogenomic interactions based on Random Forests. We matched two databases from the Cancer Cell Line Encyclopaedia (CCLE) project, and the Genomics of Drug Sensitivity in Cancer (GDSC) project. For a total of 648 shared cell lines, we considered 48,270 gene alterations from CCLE as input features and the area under the dose-response curve (AUC) for 265 drugs from GDSC as the outcomes. A three-step reduction t…
RepeatsDB 2.0: improved annotation, classification, search and visualization of repeat protein structures
2017
RepeatsDB 2.0 (URL: http://repeatsdb.bio.unipd.it/) is an update of the database of annotated tandem repeat protein structures. Repeat proteins are a widespread class of non-globular proteins carrying heterogeneous functions involved in several diseases. Here we provide a new version of RepeatsDB with an improved classification schema including high quality annotations for ∼5400 protein structures. RepeatsDB 2.0 features information on start and end positions for the repeat regions and units for all entries. The extensive growth of repeat unit characterization was possible by applying the novel ReUPred annotation method over the entire Protein Data Bank, with data quality is guaranteed by a…
Cancer: a disease at the crossroads of trade-offs
2017
11 pages; International audience; Central to evolutionary theory is the idea that living organisms face phenotypic and/or genetic trade-offs when allocating resources to competing life-history demands, such as growth, survival, and reproduction. These trade-offs are increasingly considered to be crucial to further our understanding of cancer. First, evidences suggest that neoplastic cells, as any living entities subject to natural selection, are governed by trade-offs such as between survival and proliferation. Second, selection might also have shaped trade-offs at the organismal level, especially regarding protective mechanisms against cancer. Cancer can also emerge as a consequence of add…
Prediction of Chromatin Accessibility in Gene-Regulatory Regions from Transcriptomics Data
2017
AbstractThe epigenetics landscape of cells plays a key role in the establishment of cell-type specific gene expression programs characteristic of different cellular phenotypes. Different experimental procedures have been developed to obtain insights into the accessible chromatin landscape including DNase-seq, FAIRE-seq and ATAC-seq. However, current downstream computational tools fail to reliably determine regulatory region accessibility from the analysis of these experimental data. In particular, currently available peak calling algorithms are very sensitive to their parameter settings and show highly heterogeneous results, which hampers a trustworthy identification of accessible chromatin…
Block Sorting-Based Transformations on Words: Beyond the Magic BWT
2018
The Burrows-Wheeler Transform (BWT) is a word transformation introduced in 1994 for Data Compression and later results have contributed to make it a fundamental tool for the design of self-indexing compressed data structures. The Alternating Burrows-Wheeler Transform (ABWT) is a more recent transformation, studied in the context of Combinatorics on Words, that works in a similar way, using an alternating lexicographical order instead of the usual one. In this paper we study a more general class of block sorting-based transformations. The transformations in this new class prove to be interesting combinatorial tools that offer new research perspectives. In particular, we show that all the tra…
SNVSniffer: an integrated caller for germline and somatic single-nucleotide and indel mutations
2016
Various approaches to calling single-nucleotide variants (SNVs) or insertion-or-deletion (indel) mutations have been developed based on next-generation sequencing (NGS). However, most of them are dedicated to a particular type of mutation, e.g. germline SNVs in normal cells, somatic SNVs in cancer/tumor cells, or indels only. In the literature, efficient and integrated callers for both germline and somatic SNVs/indels have not yet been extensively investigated. We present SNVSniffer, an efficient and integrated caller identifying both germline and somatic SNVs/indels from NGS data. In this algorithm, we propose the use of Bayesian probabilistic models to identify SNVs and investigate a mult…
CUDA-enabled hierarchical ward clustering of protein structures based on the nearest neighbour chain algorithm
2015
Clustering of molecular systems according to their three-dimensional structure is an important step in many bioinformatics workflows. In applications such as docking or structure prediction, many algorithms initially generate large numbers of candidate poses (or decoys), which are then clustered to allow for subsequent computationally expensive evaluations of reasonable representatives. Since the number of such candidates can easily range from thousands to millions, performing the clustering on standard central processing units (CPUs) is highly time consuming. In this paper, we analyse and evaluate different approaches to parallelize the nearest neighbour chain algorithm to perform hierarc…
Stagewise pseudo-value regression for time-varying effects on the cumulative incidence
2015
In a competing risks setting, the cumulative incidence of an event of interest describes the absolute risk for this event as a function of time. For regression analysis, one can either choose to model all competing events by separate cause-specific hazard models or directly model the association between covariates and the cumulative incidence of one of the events. With a suitable link function, direct regression models allow for a straightforward interpretation of covariate effects on the cumulative incidence. In practice, where data can be right-censored, these regression models are implemented using a pseudo-value approach. For a grid of time points, the possibly unobserved binary event s…