Search results for "Computer Science Applications"
showing 10 items of 3993 documents
Multi-omics HeCaToS dataset of repeated dose toxicity for cardiotoxic & hepatotoxic compounds.
2022
The data currently described was generated within the EU/FP7 HeCaToS project (Hepatic and Cardiac Toxicity Systems modeling). The project aimed to develop an in silico prediction system to contribute to drug safety assessment for humans. For this purpose, multi-omics data of repeated dose toxicity were obtained for 10 hepatotoxic and 10 cardiotoxic compounds. Most data were gained from in vitro experiments in which 3D microtissues (either hepatic or cardiac) were exposed to a therapeutic (physiologically relevant concentrations calculated through PBPK-modeling) or a toxic dosing profile (IC20 after 7 days). Exposures lasted for 14 days and samples were obtained at 7 time points (therapeutic…
Adaptive reference-free compression of sequence quality scores
2014
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…
Metagenomics reveals our incomplete knowledge of global diversity
2008
Metagenomic sequencing obtains huge amounts of sequences from environmental and clinical samples, thus providing a glimpse of the global prokaryotic diversity of both species and genes in these sources. The current trend in metagenomic analysis follows the so-called gene-centric approach, focused on describing the environments by the study of the functional roles of the proteins encoded in the sequenced genes. In this way, it is clear that metagenomic analysis relies heavily on the accurate knowledge of the universe of proteins stored in the databases. Nevertheless, it is known that some biases exist in the composition of databases (which are rich in sequences from common, cultivable and ea…
SeqEditor: an application for primer design and sequence analysis with or without GTF/GFF files
2021
[Motivation]: Sequence analyses oriented to investigate specific features, patterns and functions of protein and DNA/RNA sequences usually require tools based on graphic interfaces whose main characteristic is their intuitiveness and interactivity with the user’s expertise, especially when curation or primer design tasks are required. However, interface-based tools usually pose certain computational limitations when managing large sequences or complex datasets, such as genome and transcriptome assemblies. Having these requirments in mind we have developed SeqEditor an interactive software tool for nucleotide and protein sequences’ analysis.
Sparse kernel methods for high-dimensional survival data
2008
Abstract Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be ‘kernelized’. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, dependin…
Pseudo-Cut Strategies for Global Optimization
2011
Motivated by the successful use of a pseudo-cut strategy within the setting of constrained nonlinear and nonconvex optimization in Lasdon et al. (2010), we propose a framework for general pseudo-cut strategies in global optimization that provides a broader and more comprehensive range of methods. The fundamental idea is to introduce linear cutting planes that provide temporary, possibly invalid, restrictions on the space of feasible solutions, as proposed in the setting of the tabu search metaheuristic in Glover (1989), in order to guide a solution process toward a global optimum, where the cutting planes can be discarded and replaced by others as the process continues. These strategies can…
Comprehensive estimation of input signals and dynamics in biochemical reaction networks
2012
Abstract Motivation: Cellular information processing can be described mathematically using differential equations. Often, external stimulation of cells by compounds such as drugs or hormones leading to activation has to be considered. Mathematically, the stimulus is represented by a time-dependent input function. Parameters such as rate constants of the molecular interactions are often unknown and need to be estimated from experimental data, e.g. by maximum likelihood estimation. For this purpose, the input function has to be defined for all times of the integration interval. This is usually achieved by approximating the input by interpolation or smoothing of the measured data. This procedu…
ballaxy: web services for structural bioinformatics.
2014
Abstract Motivation: Web-based workflow systems have gained considerable momentum in sequence-oriented bioinformatics. In structural bioinformatics, however, such systems are still relatively rare; while commercial stand-alone workflow applications are common in the pharmaceutical industry, academic researchers often still rely on command-line scripting to glue individual tools together. Results: In this work, we address the problem of building a web-based system for workflows in structural bioinformatics. For the underlying molecular modelling engine, we opted for the BALL framework because of its extensive and well-tested functionality in the field of structural bioinformatics. The large …
Assessment of the probabilities for evolutionary structural changes in protein folds.
2007
Abstract Motivation: The evolution of protein sequences can be described by a stepwise process, where each step involves changes of a few amino acids. In a similar manner, the evolution of protein folds can be at least partially described by an analogous process, where each step involves comparatively simple changes affecting few secondary structure elements. A number of such evolution steps, justified by biologically confirmed examples, have previously been proposed by other researchers. However, unlike the situation with sequences, as far as we know there have been no attempts to estimate the comparative probabilities for different kinds of such structural changes. Results: We have tried …
CARE: context-aware sequencing read error correction.
2020
Abstract Motivation Error correction is a fundamental pre-processing step in many Next-Generation Sequencing (NGS) pipelines, in particular for de novo genome assembly. However, existing error correction methods either suffer from high false-positive rates since they break reads into independent k-mers or do not scale efficiently to large amounts of sequencing reads and complex genomes. Results We present CARE—an alignment-based scalable error correction algorithm for Illumina data using the concept of minhashing. Minhashing allows for efficient similarity search within large sequencing read collections which enables fast computation of high-quality multiple alignments. Sequencing errors ar…