Search results for "DATA MINING"

showing 10 items of 907 documents

Stochastic sampling effects favor manual over digital contact tracing.

2020

Isolation of symptomatic individuals, tracing and testing of their nonsymptomatic contacts are fundamental strategies for mitigating the current COVID-19 pandemic. The breaking of contagion chains relies on two complementary strategies: manual reconstruction of contacts based on interviews and a digital (app-based) privacy-preserving contact tracing. We compare their effectiveness using model parameters tailored to describe SARS-CoV-2 diffusion within the activity-driven model, a general empirically validated framework for network dynamics. We show that, even for equal probability of tracing a contact, manual tracing robustly performs better than the digital protocol, also taking into accou…

0301 basic medicinePhysics - Physics and SocietyComputer scienceEpidemiologyScienceComplex networksFOS: Physical sciencesGeneral Physics and AstronomyPhysics and Society (physics.soc-ph)Tracingcomputer.software_genreGeneral Biochemistry Genetics and Molecular BiologyArticleSpecimen Handling03 medical and health sciences0302 clinical medicineHumans030212 general & internal medicineQuantitative Biology - Populations and EvolutionPandemicsCondensed Matter - Statistical Mechanicsstochastic modelProtocol (science)Stochastic ProcessesMultidisciplinaryStatistical Mechanics (cond-mat.stat-mech)Stochastic processDiagnostic Tests RoutineSARS-CoV-2QPopulations and Evolution (q-bio.PE)Sampling (statistics)COVID-19General ChemistryComplex networkModels TheoreticalNetwork dynamics030104 developmental biologyFOS: Biological sciencesScalabilityQuarantineData miningContact TracingcomputerContact tracingAlgorithmsNature communications
researchProduct

FastaHerder2: Four Ways to Research Protein Function and Evolution with Clustering and Clustered Databases.

2016

The accelerated growth of protein databases offers great possibilities for the study of protein function using sequence similarity and conservation. However, the huge number of sequences deposited in these databases requires new ways of analyzing and organizing the data. It is necessary to group the many very similar sequences, creating clusters with automated derived annotations useful to understand their function, evolution, and level of experimental evidence. We developed an algorithm called FastaHerder2, which can cluster any protein database, putting together very similar protein sequences based on near-full-length similarity and/or high threshold of sequence identity. We compressed 50…

0301 basic medicineProtein structure databaseProteomicsProteomeSequence analysisComputer sciencecomputer.software_genreSensitivity and SpecificitySet (abstract data type)Evolution Molecular03 medical and health sciences0302 clinical medicineSimilarity (network science)Sequence Analysis ProteinGeneticsCluster (physics)AnimalsCluster AnalysisHumansCluster analysisDatabases ProteinMolecular BiologySequenceDatabaseFunction (mathematics)Computational Mathematics030104 developmental biologyComputational Theory and MathematicsModeling and SimulationData miningcomputer030217 neurology & neurosurgerySoftwareJournal of computational biology : a journal of computational molecular cell biology
researchProduct

A Simple Method to Predict Blood-Brain Barrier Permeability of Drug- Like Compounds Using Classification Trees

2017

Background: To know the ability of a compound to penetrate the blood-brain barrier (BBB) is a challenging task; despite the numerous efforts realized to predict/measure BBB passage, they still have several drawbacks. Methods: The prediction of the permeability through the BBB is carried out using classification trees. A large data set of 497 compounds (recently published) is selected to develop the tree model. Results: The best model shows an accuracy higher than 87.6% for training set; the model was also validated using 10-fold cross-validation procedure and through a test set achieving accuracy values of 86.1% and 87.9%, correspondingly. We give a brief explanation, in structural terms, o…

0301 basic medicineQuantitative structure–activity relationshipComputer scienceDatasets as TopicQuantitative Structure-Activity Relationshipcomputer.software_genre01 natural sciencesPermeability03 medical and health sciencesMolecular descriptorDrug DiscoveryInternational literatureComputer SimulationTraining setDecision tree learningDecision Trees0104 chemical sciences010404 medicinal & biomolecular chemistry030104 developmental biologyPharmaceutical PreparationsBlood-Brain BarrierTest setData miningBlood brain barrier permeabilitycomputerAlgorithmsDecision tree modelMedicinal Chemistry
researchProduct

Innovative Strategies to Develop Chemical Categories Using a Combination of Structural and Toxicological Properties.

2016

Interest is increasing in the development of non-animal methods for toxicological evaluations. These methods are however, particularly challenging for complex toxicological endpoints such as repeated dose toxicity. European Legislation, e.g., the European Union's Cosmetic Directive and REACH, demands the use of alternative methods. Frameworks, such as the Read-across Assessment Framework or the Adverse Outcome Pathway Knowledge Base, support the development of these methods. The aim of the project presented in this publication was to develop substance categories for a read-across with complex endpoints of toxicity based on existing databases. The basic conceptual approach was to combine str…

0301 basic medicineQuantitative structure–activity relationshipread acrossPredictive Clustering Tree (PCT) methodComputer science610010501 environmental sciencescomputer.software_genre600 Technik Medizin angewandte Wissenschaften::610 Medizin und Gesundheit01 natural sciences03 medical and health sciencesPharmacology (medical)Cluster analysis0105 earth and related environmental sciencesOriginal ResearchAlternative methodsPharmacologytoxicological and structural similaritybusiness.industryQSARlcsh:RM1-950non-animal methods; QSAR; readacross; Predictive Clustering Tree (PCT) method; toxicological and structural similarityIdentification (information)Tree (data structure)030104 developmental biologyConceptual approachlcsh:Therapeutics. PharmacologyKnowledge basenon-animal methodsData miningWeb servicebusinesscomputerFrontiers in pharmacology
researchProduct

CoverageAnalyzer (CAn): A Tool for Inspection of Modification Signatures in RNA Sequencing Profiles

2016

Combination of reverse transcription (RT) and deep sequencing has emerged as a powerful instrument for the detection of RNA modifications, a field that has seen a recent surge in activity because of its importance in gene regulation. Recent studies yielded high-resolution RT signatures of modified ribonucleotides relying on both sequence-dependent mismatch patterns and reverse transcription arrests. Common alignment viewers lack specialized functionality, such as filtering, tailored visualization, image export and differential analysis. Consequently, the community will profit from a platform seamlessly connecting detailed visual inspection of RT signatures and automated screening for modifi…

0301 basic medicineRNA modifications; reverse transcription; reverse transcription (RT) signature; RNA sequencing (RNA-Seq); Next-Generation Sequencing (NGS); candidate screening; alignment viewerNext-Generation Sequencing (NGS)lcsh:QR1-502[ SDV.BBM.BM ] Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyBiologycomputer.software_genre01 natural sciencesBiochemistryField (computer science)Differential analysisDeep sequencinglcsh:MicrobiologyArticleWorld Wide Web03 medical and health sciencesUser-Computer InterfaceRNA modificationsRNA sequencing (RNA-Seq)[SDV.BBM.GTP]Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]candidate screeningMolecular BiologyComputingMilieux_MISCELLANEOUS010405 organic chemistrySequence Analysis RNAGene Expression ProfilingRNAComputational BiologyHigh-Throughput Nucleotide Sequencing[SDV.BBM.BM]Life Sciences [q-bio]/Biochemistry Molecular Biology/Molecular biologyreverse transcription (RT) signaturereverse transcriptionFile formatalignment viewer0104 chemical sciencesVisualizationVisual inspection030104 developmental biology[ SDV.BBM.GTP ] Life Sciences [q-bio]/Biochemistry Molecular Biology/Genomics [q-bio.GN]Data miningcomputerSoftwareBiomolecules
researchProduct

Fragments of peer review: A quantitative analysis of the literature (1969-2015)

2018

This paper examines research on peer review between 1969 and 2015 by looking at records indexed from the Scopus database. Although it is often argued that peer review has been poorly investigated, we found that the number of publications in this field doubled from 2005. A half of this work was indexed as research articles, a third as editorial notes and literature reviews and the rest were book chapters or letters. We identified the most prolific and influential scholars, the most cited publications and the most important journals in the field. Co-authorship network analysis showed that research on peer review is fragmented, with the largest group of co-authors including only 2.1% of the wh…

0301 basic medicineScience and Technology WorkforceResearch Quality Assessmentlcsh:MedicineCareers in ResearchPeer review co-authorship collaboration communityCitation analysisCentralityData MiningSociologylcsh:ScienceMultidisciplinary05 social sciencesScientometricsco-authorshipResearch AssessmentKnowledge sharingProfessionsCitation AnalysiscommunityNetwork AnalysisResearch ArticleComputer and Information SciencesScience PolicyAbstracting and IndexingPeer ReviewAbstracting and Indexing as Topic ; Animals ; Data Mining ; Databases Bibliographic ; History 20th Century ; History 21st Century ; Humans ; Peer ReviewScopusLibrary science050905 science studiesResearch and Analysis MethodsHistory 21st Century03 medical and health sciencesAnimalsHumansScientific Publishinglcsh:RScientometricsHistory 20th CenturyDatabases Bibliographiccollaboration030104 developmental biologyQuantitative analysis (finance)People and PlacesScientistslcsh:QPopulation Groupings0509 other social sciencesScientific publishingCentrality
researchProduct

Prediction of Chromatin Accessibility in Gene-Regulatory Regions from Transcriptomics Data

2017

AbstractThe epigenetics landscape of cells plays a key role in the establishment of cell-type specific gene expression programs characteristic of different cellular phenotypes. Different experimental procedures have been developed to obtain insights into the accessible chromatin landscape including DNase-seq, FAIRE-seq and ATAC-seq. However, current downstream computational tools fail to reliably determine regulatory region accessibility from the analysis of these experimental data. In particular, currently available peak calling algorithms are very sensitive to their parameter settings and show highly heterogeneous results, which hampers a trustworthy identification of accessible chromatin…

0301 basic medicineScienceComputational biologyRegulatory Sequences Nucleic AcidBiologycomputer.software_genreArticleEpigenesis Genetic03 medical and health sciencesDatabases GeneticHumansEpigeneticsComputational modelDeoxyribonucleasesMultidisciplinarySequence Analysis RNAGene Expression ProfilingDecision tree learningQRSequence Analysis DNAChromatinChromatinGene expression profilingIdentification (information)030104 developmental biologyGene Expression RegulationMedicineData miningPrecision and recallPeak callingcomputerAlgorithmsScientific reports
researchProduct

SpaceScanner: COPASI wrapper for automated management of global stochastic optimization experiments

2017

Abstract Motivation Due to their universal applicability, global stochastic optimization methods are popular for designing improvements of biochemical networks. The drawbacks of global stochastic optimization methods are: (i) no guarantee of finding global optima, (ii) no clear optimization run termination criteria and (iii) no criteria to detect stagnation of an optimization run. The impact of these drawbacks can be partly compensated by manual work that becomes inefficient when the solution space is large due to combinatorial explosion of adjustable parameters or for other reasons. Results SpaceScanner uses parallel optimization runs for automatic termination of optimization tasks in case…

0301 basic medicineStatistics and ProbabilityComputer science0206 medical engineeringComputational Biology02 engineering and technologycomputer.software_genreModels BiologicalBiochemistryComputer Science ApplicationsSet (abstract data type)03 medical and health sciencesComputational Mathematics030104 developmental biologyComputational Theory and MathematicsStochastic optimizationData miningMolecular BiologycomputerSoftware020602 bioinformaticsCombinatorial explosionBioinformatics
researchProduct

Partitioned learning of deep Boltzmann machines for SNP data.

2016

Abstract Motivation Learning the joint distributions of measurements, and in particular identification of an appropriate low-dimensional manifold, has been found to be a powerful ingredient of deep leaning approaches. Yet, such approaches have hardly been applied to single nucleotide polymorphism (SNP) data, probably due to the high number of features typically exceeding the number of studied individuals. Results After a brief overview of how deep Boltzmann machines (DBMs), a deep learning approach, can be adapted to SNP data in principle, we specifically present a way to alleviate the dimensionality problem by partitioned learning. We propose a sparse regression approach to coarsely screen…

0301 basic medicineStatistics and ProbabilityComputer scienceMachine learningcomputer.software_genre01 natural sciencesBiochemistryPolymorphism Single NucleotideMachine Learning010104 statistics & probability03 medical and health sciencessymbols.namesakeJoint probability distributionHumans0101 mathematicsMolecular BiologyStatistical hypothesis testingArtificial neural networkbusiness.industryGene Expression Regulation LeukemicDeep learningUnivariateComputational BiologyManifoldComputer Science ApplicationsData setComputational Mathematics030104 developmental biologyComputingMethodologies_PATTERNRECOGNITIONComputational Theory and MathematicsLeukemia MyeloidBoltzmann constantsymbolsData miningArtificial intelligencebusinesscomputerSoftwareCurse of dimensionalityBioinformatics (Oxford, England)
researchProduct

ParDRe: faster parallel duplicated reads removal tool for sequencing studies

2016

This is a pre-copyedited, author-produced version of an article accepted for publication in Bioinformatics following peer review. The version of record [insert complete citation information here] is available online at: https://doi.org/10.1093/bioinformatics/btw038 [Abstract] Summary: Current next generation sequencing technologies often generate duplicated or near-duplicated reads that (depending on the application scenario) do not provide any interesting biological information but can increase memory requirements and computational time of downstream analysis. In this work we present ParDRe , a de novo parallel tool to remove duplicated and near-duplicated reads through the clustering of S…

0301 basic medicineStatistics and ProbabilityFASTQ formatDNA stringsSource codeDownstream (software development)Computer sciencemedia_common.quotation_subjectParallel computingcomputer.software_genreBiochemistryDNA sequencing03 medical and health scienceschemistry.chemical_compound0302 clinical medicineHybrid MPI/multithreadingCluster AnalysisParDReMolecular BiologyGenemedia_commonHigh-Throughput Nucleotide SequencingSequence Analysis DNAParallel toolComputer Science ApplicationsComputational Mathematics030104 developmental biologyComputational Theory and MathematicschemistryData miningcomputerAlgorithms030217 neurology & neurosurgeryDNABioinformatics
researchProduct