Search results for "DATA MINING"
showing 10 items of 907 documents
Feasibility of sample size calculation for RNA-seq studies
2017
Sample size calculation is a crucial step in study design but is not yet fully established for RNA sequencing (RNA-seq) analyses. To evaluate feasibility and provide guidance, we evaluated RNA-seq sample size tools identified from a systematic search. The focus was on whether real pilot data would be needed for reliable results and on identifying tools that would perform well in scenarios with different levels of biological heterogeneity and fold changes (FCs) between conditions. We used simulations based on real data for tool evaluation. In all settings, the six evaluated tools provided widely different answers, which were strongly affected by FC. Although all tools failed for small FCs, s…
Common Hits Approach: Combining Pharmacophore Modeling and Molecular Dynamics Simulations.
2017
We present a new approach that incorporates flexibility based on extensive MD simulations of protein-ligand complexes into structure-based pharmacophore modeling and virtual screening. The approach uses the multiple coordinate sets saved during the MD simulations and generates for each frame a pharmacophore model. Pharmacophore models with the same pharmacophore features are pooled. In this way the high number of pharmacophore models that results from the MD simulation is reduced to only a few hundred representative pharmacophore models. Virtual screening runs are performed with every representative pharmacophore model; the screening results are combined and rescored to generate a single hi…
CLOVE: classification of genomic fusions into structural variation events
2017
Background A precise understanding of structural variants (SVs) in DNA is important in the study of cancer and population diversity. Many methods have been designed to identify SVs from DNA sequencing data. However, the problem remains challenging because existing approaches suffer from low sensitivity, precision, and positional accuracy. Furthermore, many existing tools only identify breakpoints, and so not collect related breakpoints and classify them as a particular type of SV. Due to the rapidly increasing usage of high throughput sequencing technologies in this area, there is an urgent need for algorithms that can accurately classify complex genomic rearrangements (involving more than …
FragClust and TestClust, two informatics tools for chemical structure hierarchical clustering analysis applied to lipidomics. The example of Alzheime…
2016
Lipidomic analysis is able to measure simultaneously thousands of compounds belonging to a few lipid classes. In each lipid class, compounds differ only by the acyl radical, ranging between C10:0 (capric acid) and C24:0 (lignoceric acid). Although some metabolites have a peculiar pathological role, more often compounds belonging to a single lipid class exert the same biological effect. Here, we present a lipidomics workflow that extracts the tandem mass spectrometry data from individual files and uses them to group compounds into structurally homogeneous clusters by chemical structure hierarchical clustering analysis (CHCA). The case-to-control peak area ratios of the metabolites are then a…
A multicenter study benchmarks software tools for label-free proteome quantification
2016
The consistent and accurate quantification of proteins by mass spectrometry (MS)-based proteomics depends on the performance of instruments, acquisition methods and data analysis software. In collaboration with the software developers, we evaluated OpenSWATH, SWATH2.0, Skyline, Spectronaut and DIA-Umpire, five of the most widely used software methods for processing data from SWATH-MS (sequential window acquisition of all theoretical fragment ion spectra), a method that uses data-independent acquisition (DIA) for label-free protein quantification. We analyzed high-complexity test datasets from hybrid proteome samples of defined quantitative composition acquired on two different MS instrument…
The predictive value of microbiological findings on teeth, internal and external implant portions in clinical decision making
2017
International audience; Aim: The primary aim of this study was to evaluate 23 pathogens associated with peri-implantitis at inner part of implant connections, in peri-implant and periodontal pockets between patients suffering peri-implantitis and participants with healthy peri-implant tissues; the secondary aim was to estimate the predictive value of microbiological profile in patients wearing dental implants using data mining methods.Material and Methods: Fifty participants included in the present case─control study were scheduled for collection of plaque samples from the peri-implant pockets, internal connection, and periodontal pocket. Real-time polymerase chain reaction was performed to…
Conf-VLKA: A structure-based revisitation of the Virtual Lock-and-key Approach
2016
In a previous work, we developed the in house Virtual Lock-and-Key Approach (VLKA) in order to evaluate target assignment starting from molecular descriptors calculated on known inhibitors used as an information source. This protocol was able to predict the correct biological target for the whole dataset with a good degree of reliability (80%), and proved experimentally, which was useful for the target fishing of unknown compounds. In this paper, we tried to remodel the previous in house developed VLKA in a more sophisticated one in order to evaluate the influence of 3D conformation of ligands on the accuracy of the prediction. We applied the same previous algorithm of scoring and ranking b…
The macroecology of cancer incidences in humans is associated with large-scale assemblages of endemic infections.
2018
8 pages; International audience; It is now well supported that 20% of human cancers have an infectious causation (i.e., oncogenic agents). Accumulating evidence suggests that aside from this direct role, other infectious agents may also indirectly affect cancer epidemiology through interactions with the oncogenic agents within the wider infection community. Here, we address this hypothesis via analysis of large-scale global data to identify associations between human cancer incidence and assemblages of neglected infectious agents. We focus on a gradient of three widely-distributed cancers with an infectious cause: bladder (~2% of recorded cancer cases are due to Shistosoma haematobium), liv…
Toward a direct and scalable identification of reduced models for categorical processes.
2017
The applicability of many computational approaches is dwelling on the identification of reduced models defined on a small set of collective variables (colvars). A methodology for scalable probability-preserving identification of reduced models and colvars directly from the data is derived—not relying on the availability of the full relation matrices at any stage of the resulting algorithm, allowing for a robust quantification of reduced model uncertainty and allowing us to impose a priori available physical information. We show two applications of the methodology: (i) to obtain a reduced dynamical model for a polypeptide dynamics in water and (ii) to identify diagnostic rules from a standar…
The Anemonia viridis Venom: Coupling Biochemical Purification and RNA-Seq for Translational Research
2018
Blue biotechnologies implement marine bio-resources for addressing practical concerns. The isolation of biologically active molecules from marine animals is one of the main ways this field develops. Strikingly, cnidaria are considered as sustainable resources for this purpose, as they possess unique cells for attack and protection, producing an articulated cocktail of bioactive substances. The Mediterranean sea anemone Anemonia viridis has been studied extensively for years. In this short review, we summarize advances in bioprospecting of the A. viridis toxin arsenal. A. viridis RNA datasets and toxin data mining approaches are briefly described. Analysis reveals the major pool of neurotoxi…