Search results for "Abstract data type"

showing 10 items of 1140 documents

FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases.

2016

The accelerated growth of protein databases offers great possibilities for the study of protein function using sequence similarity and conservation. However, the huge number of sequences deposited in these databases requires new ways of analyzing and organizing the data. It is necessary to group the many very similar sequences, creating clusters with automated derived annotations useful to understand their function, evolution, and level of experimental evidence. We developed an algorithm called FastaHerder2, which can cluster any protein database, putting together very similar protein sequences based on near-full-length similarity and/or high threshold of sequence identity. We compressed 50…

0301 basic medicineProtein structure databaseProteomicsProteomeSequence analysisComputer sciencecomputer.software_genreSensitivity and SpecificitySet (abstract data type)Evolution Molecular03 medical and health sciences0302 clinical medicineSimilarity (network science)Sequence Analysis ProteinGeneticsCluster (physics)AnimalsCluster AnalysisHumansCluster analysisDatabases ProteinMolecular BiologySequenceDatabaseFunction (mathematics)Computational Mathematics030104 developmental biologyComputational Theory and MathematicsModeling and SimulationData miningcomputer030217 neurology & neurosurgerySoftwareJournal of computational biology : a journal of computational molecular cell biology

researchProduct

Variable Ranking Feature Selection for the Identification of Nucleosome Related Sequences

2018

Several recent works have shown that K-mer sequence representation of a DNA sequence can be used for classification or identification of nucleosome positioning related sequences. This representation can be computationally expensive when k grows, making the complexity in spaces of exponential dimension. This issue effects significantly the classification task computed by a general machine learning algorithm used for the purpose of sequence classification. In this paper, we investigate the advantage offered by the so-called Variable Ranking Feature Selection method to select the most informative k − mers associated to a set of DNA sequences, for the final purpose of nucleosome/linker classifi…

0301 basic medicineSequenceSettore INF/01 - InformaticaEpigenomic030102 biochemistry & molecular biologybusiness.industryComputer scienceDeep learningPattern recognitionFeature selectionDNA sequencesNucleosomesRanking (information retrieval)Set (abstract data type)03 medical and health sciencesVariable (computer science)030104 developmental biologyDimension (vector space)Feature selectionDeep learning modelsArtificial intelligenceDeep learning models Feature selection DNA sequences Epigenomic NucleosomesRepresentation (mathematics)business

researchProduct

Combining multiple hypothesis testing with machine learning increases the statistical power of genome-wide association studies

2016

Mieth, Bettina et al.

0301 basic medicineStatistical methodsComputer scienceGenome-wide association studyMachine learningcomputer.software_genreGenome-wide association studiesStatistical powerArticle[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]Set (abstract data type)03 medical and health sciences[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG][MATH.MATH-ST]Mathematics [math]/Statistics [math.ST]10007 Department of EconomicsStatistical significanceReplication (statistics)genomeStatistical hypothesis testingGenetic association1000 MultidisciplinaryMultidisciplinarybusiness.industryComputational scienceInstitut für Mathematik330 EconomicsSupport vector machine030104 developmental biologyMultiple comparisons problemwide association studiesstatistical methodsArtificial intelligencebusinesscomputer

researchProduct

SpaceScanner: COPASI wrapper for automated management of global stochastic optimization experiments

2017

Abstract Motivation Due to their universal applicability, global stochastic optimization methods are popular for designing improvements of biochemical networks. The drawbacks of global stochastic optimization methods are: (i) no guarantee of finding global optima, (ii) no clear optimization run termination criteria and (iii) no criteria to detect stagnation of an optimization run. The impact of these drawbacks can be partly compensated by manual work that becomes inefficient when the solution space is large due to combinatorial explosion of adjustable parameters or for other reasons. Results SpaceScanner uses parallel optimization runs for automatic termination of optimization tasks in case…

0301 basic medicineStatistics and ProbabilityComputer science0206 medical engineeringComputational Biology02 engineering and technologycomputer.software_genreModels BiologicalBiochemistryComputer Science ApplicationsSet (abstract data type)03 medical and health sciencesComputational Mathematics030104 developmental biologyComputational Theory and MathematicsStochastic optimizationData miningMolecular BiologycomputerSoftware020602 bioinformaticsCombinatorial explosionBioinformatics

researchProduct

Identification of control targets in Boolean molecular network models via computational algebra

2015

Motivation: Many problems in biomedicine and other areas of the life sciences can be characterized as control problems, with the goal of finding strategies to change a disease or otherwise undesirable state of a biological system into another, more desirable, state through an intervention, such as a drug or other therapeutic treatment. The identification of such strategies is typically based on a mathematical model of the process to be altered through targeted control inputs. This paper focuses on processes at the molecular level that determine the state of an individual cell, involving signaling or gene regulation. The mathematical model type considered is that of Boolean networks. The pot…

0301 basic medicineTheoretical computer scienceComputer scienceProcess (engineering)Molecular Networks (q-bio.MN)Systems biologySystem of polynomial equationsENCODEBoolean networksSet (abstract data type)03 medical and health sciences0302 clinical medicineStructural BiologyModelling and SimulationQuantitative Biology - Molecular NetworksMolecular BiologyEdge deletionsApplied MathematicsComputer Science ApplicationsNetwork controlIdentification (information)030104 developmental biologyBoolean networkBlocking transitionsFOS: Biological sciencesModeling and SimulationAlgebraic controlState (computer science)030217 neurology & neurosurgeryResearch ArticleBMC Systems Biology

researchProduct

SWhybrid: A Hybrid-Parallel Framework for Large-Scale Protein Sequence Database Search

2017

Computer architectures continue to develop rapidly towards massively parallel and heterogeneous systems. Thus, easily extensible yet highly efficient parallelization approaches for a variety of platforms are urgently needed. In this paper, we present SWhybrid, a hybrid computing framework for large-scale biological sequence database search on heterogeneous computing environments with multi-core or many-core processing units (PUs) based on the Smith- Waterman (SW) algorithm. To incorporate a diverse set of PUs such as combinations of CPUs, GPUs and Xeon Phis, we abstract them as SIMD vector execution units with different number of lanes. We propose a machine model, associated with a unified …

0301 basic medicineXeonSequence databasebusiness.industryComputer scienceInterface (computing)Symmetric multiprocessor systemParallel computingSet (abstract data type)03 medical and health sciences030104 developmental biologySoftwareComputer architectureSIMDbusinessMassively parallel2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS)

researchProduct

An Integrative Framework for the Construction of Big Functional Networks

2018

We present a methodology for biological data integration, aiming at building and analysing large functional networks which model complex genotype-phenotype associations. A functional network is a graph where nodes represent cellular components (e.g., genes, proteins, mRNA, etc.) and edges represent associations among such molecules. Different types of components may cohesist in the same network, and associations may be related to physical[biochemical interactions or functional/phenotipic relationships. Due to both the large amount of involved information and the computational complexity typical of the problems in this domain, the proposed framework is based on big data technologies (Spark a…

0301 basic medicinebiological networkBiological dataTheoretical computer scienceSettore INF/01 - InformaticaComputational complexity theoryComputer sciencebusiness.industryBig dataNoSQLcomputer.software_genreFunctional networks03 medical and health sciences030104 developmental biologyGraph (abstract data type)big data technologiesbig data technologiebusinesscomputerIntegrative approacheBiological network2018 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

researchProduct

Discovering unbounded unions of regular pattern languages from positive examples

1996

The problem of learning unions of certain pattern languages from positive examples is considered. We restrict to the regular patterns, i.e., patterns where each variable symbol can appear only once, and to the substring patterns, which is a subclass of regular patterns of the type xαy, where x and y are variables and α is a string of constant symbols. We present an algorithm that, given a set of strings, finds a good collection of patterns covering this set. The notion of a ‘good covering’ is defined as the most probable collection of patterns likely to be present in the examples, assuming a simple probabilistic model, or equivalently using the Minimum Description Length (MDL) principle. Ou…

0303 health sciencesComputer scienceString (computer science)0102 computer and information sciences01 natural sciencesSubstringCombinatoricsSet (abstract data type)03 medical and health sciencesVariable (computer science)Cover (topology)010201 computation theory & mathematicsSimple (abstract algebra)Minimum description length030304 developmental biology

researchProduct

Main Steps in Image Processing and Quantification: The Analysis Workflow

2019

In the last decades, the variety of programs, algorithms, and strategies that researchers have at their disposal to process and analyze image files has grown extensively. However, these are only pointless tools if not applied with the careful planning required to achieve a succesful image analysis. In order to do so, the analyst must establish a meaningful and effective sequence of orderly operations that is able to (1) overcome all the problems derived from the image manipulation and (2) successfully resolve the question that was originally posed. In this chapter, the authors suggest a set of strategies and present a reflection on the main milestones that compose the image processing workf…

0303 health sciencesReflection (computer programming)Process (engineering)Computer sciencebusiness.industryImage processingcomputer.file_formatVariety (cybernetics)Set (abstract data type)03 medical and health sciences0302 clinical medicineWorkflowImage file formatsSoftware engineeringbusinesscomputer030217 neurology & neurosurgery030304 developmental biology

researchProduct

Reverse-safe data structures for text indexing

2021

We introduce the notion of reverse-safe data structures. These are data structures that prevent the reconstruction of the data they encode (i.e., they cannot be easily reversed). A data structure D is called z-reverse-safe when there exist at least z datasets with the same set of answers as the ones stored by D. The main challenge is to ensure that D stores as many answers to useful queries as possible, is constructed efficiently, and has size close to the size of the original dataset it encodes. Given a text of length n and an integer z, we propose an algorithm which constructs a z-reverse-safe data structure that has size O(n) and answers pattern matching queries of length at most d optim…

050101 languages & linguisticsComputer sciencedata structure02 engineering and technologyprivacySet (abstract data type)combinatoric0202 electrical engineering electronic engineering information engineering0501 psychology and cognitive sciencesPattern matchingSettore ING-INF/05 - Sistemi Di Elaborazione Delle InformazionialgorithmSettore INF/01 - Informatica05 social sciencesSearch engine indexingINF/01 - INFORMATICAdata miningData structureMatrix multiplicationcombinatoricsExponent020201 artificial intelligence & image processingdata structure; algorithm; combinatorics; de Bruijn graph; data mining; privacyAlgorithmAdversary modelde Bruijn graphInteger (computer science)

researchProduct