Search results for "Computer Science Application"
showing 10 items of 3998 documents
Textual data compression in computational biology: a synopsis.
2009
Abstract Motivation: Textual data compression, and the associated techniques coming from information theory, are often perceived as being of interest for data communication and storage. However, they are also deeply related to classification and data mining and analysis. In recent years, a substantial effort has been made for the application of textual data compression techniques to various computational biology tasks, ranging from storage and indexing of large datasets to comparison and reverse engineering of biological networks. Results: The main focus of this review is on a systematic presentation of the key areas of bioinformatics and computational biology where compression has been use…
Weighted distance-based trees for ranking data
2017
Within the framework of preference rankings, the interest can lie in finding which predictors and which interactions are able to explain the observed preference structures, because preference decisions will usually depend on the characteristics of both the judges and the objects being judged. This work proposes the use of a univariate decision tree for ranking data based on the weighted distances for complete and incomplete rankings, and considers the area under the ROC curve both for pruning and model assessment. Two real and well-known datasets, the SUSHI preference data and the University ranking data, are used to display the performance of the methodology.
Stochastic Learning for SAT- Encoded Graph Coloring Problems
2010
The graph coloring problem (GCP) is a widely studied combinatorial optimization problem due to its numerous applications in many areas, including time tabling, frequency assignment, and register allocation. The need for more efficient algorithms has led to the development of several GC solvers. In this paper, the authors introduce a team of Finite Learning Automata, combined with the random walk algorithm, using Boolean satisfiability encoding for the GCP. The authors present an experimental analysis of the new algorithm’s performance compared to the random walk technique, using a benchmark set containing SAT-encoding graph coloring test sets.
Mean-field games and dynamic demand management in power grids
2013
This paper applies mean-field game theory to dynamic demand management. For a large population of electrical heating or cooling appliances (called agents), we provide a mean-field game that guarantees desynchronization of the agents thus improving the power network resilience. Second, for the game at hand, we exhibit a mean-field equilibrium, where each agent adopts a bang-bang switching control with threshold placed at a nominal temperature. At equilibrium, through an opportune design of the terminal penalty, the switching control regulates the mean temperature (computed over the population) and the mains frequency around the nominal value. To overcome Zeno phenomena we also adjust the ban…
Reducing the effect of the data order in algorithms for constructing phylogenetic trees.
1988
Multi-omics HeCaToS dataset of repeated dose toxicity for cardiotoxic & hepatotoxic compounds.
2022
The data currently described was generated within the EU/FP7 HeCaToS project (Hepatic and Cardiac Toxicity Systems modeling). The project aimed to develop an in silico prediction system to contribute to drug safety assessment for humans. For this purpose, multi-omics data of repeated dose toxicity were obtained for 10 hepatotoxic and 10 cardiotoxic compounds. Most data were gained from in vitro experiments in which 3D microtissues (either hepatic or cardiac) were exposed to a therapeutic (physiologically relevant concentrations calculated through PBPK-modeling) or a toxic dosing profile (IC20 after 7 days). Exposures lasted for 14 days and samples were obtained at 7 time points (therapeutic…
Adaptive reference-free compression of sequence quality scores
2014
Motivation: Rapid technological progress in DNA sequencing has stimulated interest in compressing the vast datasets that are now routinely produced. Relatively little attention has been paid to compressing the quality scores that are assigned to each sequence, even though these scores may be harder to compress than the sequences themselves. By aggregating a set of reads into a compressed index, we find that the majority of bases can be predicted from the sequence of bases that are adjacent to them and hence are likely to be less informative for variant calling or other applications. The quality scores for such bases are aggressively compressed, leaving a relatively small number at full reso…
Metagenomics reveals our incomplete knowledge of global diversity
2008
Metagenomic sequencing obtains huge amounts of sequences from environmental and clinical samples, thus providing a glimpse of the global prokaryotic diversity of both species and genes in these sources. The current trend in metagenomic analysis follows the so-called gene-centric approach, focused on describing the environments by the study of the functional roles of the proteins encoded in the sequenced genes. In this way, it is clear that metagenomic analysis relies heavily on the accurate knowledge of the universe of proteins stored in the databases. Nevertheless, it is known that some biases exist in the composition of databases (which are rich in sequences from common, cultivable and ea…
SeqEditor: an application for primer design and sequence analysis with or without GTF/GFF files
2021
[Motivation]: Sequence analyses oriented to investigate specific features, patterns and functions of protein and DNA/RNA sequences usually require tools based on graphic interfaces whose main characteristic is their intuitiveness and interactivity with the user’s expertise, especially when curation or primer design tasks are required. However, interface-based tools usually pose certain computational limitations when managing large sequences or complex datasets, such as genome and transcriptome assemblies. Having these requirments in mind we have developed SeqEditor an interactive software tool for nucleotide and protein sequences’ analysis.
Sparse kernel methods for high-dimensional survival data
2008
Abstract Sparse kernel methods like support vector machines (SVM) have been applied with great success to classification and (standard) regression settings. Existing support vector classification and regression techniques however are not suitable for partly censored survival data, which are typically analysed using Cox's proportional hazards model. As the partial likelihood of the proportional hazards model only depends on the covariates through inner products, it can be ‘kernelized’. The kernelized proportional hazards model however yields a solution that is dense, i.e. the solution depends on all observations. One of the key features of an SVM is that it yields a sparse solution, dependin…