Search results for "Computer Science Applications"

showing 10 items of 3993 documents

UVPAR: fast detection of functional shifts in duplicate genes.

2006

Abstract Background The imprint of natural selection on gene sequences is often difficult to detect. A plethora of methods have been devised to detect genetic changes due to selective processes. However, many of those methods depend heavily on underlying assumptions regarding the mode of change of DNA sequences and often require sophisticated mathematical treatments that made them computationally slow. The development of fast and effective methods to detect modifications in the selective constraints of genes is therefore of great interest. Results We describe UVPAR, a program designed to quickly test for changes in the functional constraints of duplicate genes. Starting with alignments of t…

DanioComputational biologyBiologylcsh:Computer applications to medicine. Medical informaticsBiochemistryDNA sequencingEvolution MolecularGenes DuplicateSequence Analysis ProteinStructural BiologySelection GeneticHox geneMolecular BiologyGenelcsh:QH301-705.5Selection (genetic algorithm)GeneticsNatural selectionApplied MathematicsProteinsSequence Analysis DNAbiology.organism_classificationComputer Science Applicationslcsh:Biology (General)lcsh:R858-859.7DNA microarraySequence AlignmentSoftwareAlgorithmsGenètica
researchProduct

Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics

2019

Abstract Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the …

Data AnalysisFOS: Computer and information sciencesTime FactorsTime FactorComputer scienceStatistics as TopicBig dataApache Spark; distributed computing; performance evaluation; k-mer countinglcsh:Computer applications to medicine. Medical informaticsBiochemistryDomain (software engineering)Databases03 medical and health sciences0302 clinical medicineStructural BiologyComputer clusterStatisticsSpark (mathematics)Molecular Biologylcsh:QH301-705.5030304 developmental biology0303 health sciencesGenomeSettore INF/01 - InformaticaBase SequenceNucleic AcidApache Sparkbusiness.industryResearchApache Spark; Distributed computing; k-mer counting; Performance evaluation; Algorithms; Base Sequence; Software; Time Factors; Data Analysis; Databases Nucleic Acid; Genome; Statistics as TopicApplied Mathematicsk-mer countingDistributed computingComputer Science ApplicationsAlgorithmData AnalysiComputer Science - Distributed Parallel and Cluster Computinglcsh:Biology (General)030220 oncology & carcinogenesisScalabilityPerformance evaluationlcsh:R858-859.7Algorithm designDistributed Parallel and Cluster Computing (cs.DC)Databases Nucleic AcidbusinessAlgorithmsSoftware
researchProduct

Global data on earthworm abundance, biomass, diversity and corresponding environmental properties

2021

Earthworms are an important soil taxon as ecosystem engineers, providing a variety of crucial ecosystem functions and services. Little is known about their diversity and distribution at large spatial scales, despite the availability of considerable amounts of local-scale data. Earthworm diversity data, obtained from the primary literature or provided directly by authors, were collated with information on site locations, including coordinates, habitat cover, and soil properties. Datasets were required, at a minimum, to include abundance or biomass of earthworms at a site. Where possible, site-level species lists were included, as well as the abundance and biomass of individual species and ec…

Data DescriptorDistribuição GeográficaPlan_S-Compliant-OASoilBiomassbiodiversityDiversityEcologyBiodiversidadeQBiodiversityeliöyhteisötmaaperäeliöstöPE&RCComputer Science ApplicationsMultidisciplinary SciencesBiogeographyinternational1181 Ecology evolutionary biologyEcosystem engineersScience & Technology - Other TopicsStatistics Probability and UncertaintyInformation SystemsStatistics and ProbabilitylierotScienceInvertebradosLibrary and Information Sciences[SDV.SA.SDS]Life Sciences [q-bio]/Agricultural sciences/Soil studyEcology and EnvironmentEducationeliömaantiede[SDV.EE.ECO]Life Sciences [q-bio]/Ecology environment/EcosystemsMinhocaServiço ambientalBIODIVERSITY CHANGELife ScienceEcosystem servicesEarthwormsDatasetsAnimalsSpatial distributionCommunity ecologyOligochaetaLaboratorium voor NematologieEcosystem1172 Environmental sciencesbiogeographyScience & TechnologyLAND-USEBiology and Life SciencesPLATFORMBodemfysica en LandbeheerEcologíaEcossistemabiodiversiteettiSoil Physics and Land ManagementSoloBiologia do Solomaaperäeläimistö570 Life sciences; biologyeartworm ; abundance ; biomass ; diversityLaboratory of Nematology[SDE.BE]Environmental Sciences/Biodiversity and EcologyCOMMUNITIEScommunity ecology
researchProduct

Controlling false match rates in record linkage using extreme value theory

2011

AbstractCleansing data from synonyms and homonyms is a relevant task in fields where high quality of data is crucial, for example in disease registries and medical research networks. Record linkage provides methods for minimizing synonym and homonym errors thereby improving data quality. We focus our attention to the case of homonym errors (in the following denoted as ‘false matches’), in which records belonging to different entities are wrongly classified as equal. Synonym errors (‘false non-matches’) occur when a single entity maps to multiple records in the linkage result. They are not considered in this study because in our application domain they are not as crucial as false matches. Fa…

Data cleansingData cleansingBiomedical ResearchDatabases FactualCalibration (statistics)Computer scienceHealth Informaticscomputer.software_genrePlot (graphics)Mean excess plotStatisticsRegistriesExtreme value theoryLinkage (software)Models StatisticalComputational BiologyFellegi–Sunter modelMixture modelGeneralized Pareto distributionComputer Science ApplicationsData qualityStatistics of extreme valuesDatabase Management SystemsMedical Record LinkageData miningcomputerAlgorithmsMedical InformaticsRecord linkageJournal of Biomedical Informatics
researchProduct

Social Media Monitoring for Crisis Communication: Process, Methods and Trends in the Scientific Literature

2014

This literature review study aims at clarifying current knowledge on social media monitoring from the perspective of organizational communication and public relations. It also contributes to crisis communication by shedding light on how fast developing social media discourse can be followed and analysed in order to understand citizens’ needs throughout all the phases of a crisis. The findings of this study reveal a number of insights in the scientific literature on the concept of monitoring, the monitoring process, methods, tools and solutions, methodological issues and trends covering the years 2009–2012. In the literature, social media monitoring is described as a process which comprises …

Data collectionKnowledge managementProcess (engineering)Management sciencebusiness.industrysocial mediaCommunicationPerspective (graphical)sosiaalinen mediaScientific literaturekriisiviestintäComputer Science ApplicationsEducationOrder (exchange)Political scienceMedia TechnologyOrganizational communicationSocial mediacrisis communicationbusinessCrisis communicationOnline Journal of Communication and Media Technologies
researchProduct

Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties – A review

2015

Abstract: Forthcoming superspectral satellite missions dedicated to land monitoring, as well as planned imaging spectrometers, will unleash an unprecedented data stream. The processing requirements for such large data streams involve processing techniques enabling the spatio-temporally explicit quantification of vegetation properties. Typically retrieval must be accurate, robust and fast. Hence, there is a strict requirement to identify next-generation bio-geophysical variable retrieval algorithms which can be molded into an operational processing chain. This paper offers a review of state-of-the-art retrieval methods for quantitative terrestrial bio-geophysical variable extraction using op…

Data streamEconomicsComputer scienceOperational variable retrievalcomputer.software_genreLaboratory of Geo-information Science and Remote SensingMachine learningPhysicalLaboratorium voor Geo-informatiekunde en Remote SensingBio-geophysical variablesComputers in Earth SciencesParametricEngineering (miscellaneous)Parametric statisticsRemote sensingData stream miningPhysicsTransparency (human–computer interaction)VegetationPE&RCNon-parametricHybridAtomic and Molecular Physics and OpticsComputer Science ApplicationsVariable (computer science)SatelliteData miningEngineering sciences. TechnologyRetrievabilitycomputerISPRS Journal of Photogrammetry and Remote Sensing
researchProduct

Guiding the modeller: organizing and selecting experimental data for single cell models using the CoCoDat database

2003

Collating, organizing and selecting quantitative experimental data are time-consuming tasks necessary for building and constraining biophysically realistic neuronal models. The CoCoDat (Collation of Cortical Data) database has been designed as an advanced environment for storing, organizing and retrieving detailed, uninterpreted quantitative data on morphology, electrophysiology and connectivity from the published literature according to neurophysiological concepts. All experimental data are linked to exact bibliographical references and detailed records of procedures used in the experiments that produced the data. We demonstrate the usefulness of CoCoDat for implementation of an example mo…

DatabaseArtificial IntelligenceComputer sciencePyramidal NeuronCognitive NeuroscienceExperimental dataMODELLERNeurophysiologyLayer (object-oriented design)Barrel cortexcomputer.software_genrecomputerComputer Science ApplicationsNeurocomputing
researchProduct

PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources.

2019

Background Biological databases and repositories are incrementing in diversity and complexity over the years. This rapid expansion of current and new sources of biological knowledge raises serious problems of data accessibility and integration. To handle the growing necessity of unification, CellBase was created as an integrative solution. CellBase provides a centralized NoSQL database containing biological information from different and heterogeneous sources. Access to this information is done through a RESTful web service API, which provides an efficient interface to the data. Results In this work we present PyCellBase, a Python package that provides programmatic access to the rich RESTfu…

Databases FactualComputer scienceAnnotationBiological databaseRESTfulcomputer.software_genreNoSQLlcsh:Computer applications to medicine. Medical informaticsBiochemistryDatabase03 medical and health sciencesAnnotationUser-Computer Interface0302 clinical medicineInstallationStructural BiologyVariantMolecular Biologylcsh:QH301-705.5030304 developmental biologycomputer.programming_language0303 health sciencesBiological dataDatabaseApplied MathematicsRepositoryComputational BiologyPython (programming language)CellBaseComputer Science Applicationslcsh:Biology (General)Scripting language030220 oncology & carcinogenesislcsh:R858-859.7Web servicecomputerSoftwarePython
researchProduct

Local dimensionality reduction and supervised learning within natural clusters for biomedical data analysis

2006

Inductive learning systems were successfully applied in a number of medical domains. Nevertheless, the effective use of these systems often requires data preprocessing before applying a learning algorithm. This is especially important for multidimensional heterogeneous data presented by a large number of features of different types. Dimensionality reduction (DR) is one commonly applied approach. The goal of this paper is to study the impact of natural clustering--clustering according to expert domain knowledge--on DR for supervised learning (SL) in the area of antibiotic resistance. We compare several data-mining strategies that apply DR by means of feature extraction or feature selection w…

Databases FactualComputer scienceFeature extractionInformation Storage and RetrievalFeature selectionMachine learningcomputer.software_genreModels BiologicalPattern Recognition AutomatedImmune systemArtificial IntelligenceDrug Resistance BacterialCluster AnalysisHumansComputer SimulationElectrical and Electronic EngineeringRepresentation (mathematics)Cluster analysisCross Infectionbusiness.industryDimensionality reductionSupervised learningGeneral MedicineAnti-Bacterial AgentsComputer Science ApplicationsData pre-processingData miningArtificial intelligenceMultidimensional systemsbusinesscomputerAlgorithmsBiotechnology
researchProduct

FABC: Retinal Vessel Segmentation Using AdaBoost

2010

This paper presents a method for automated vessel segmentation in retinal images. For each pixel in the field of view of the image, a 41-D feature vector is constructed, encoding information on the local intensity structure, spatial properties, and geometry at multiple scales. An AdaBoost classifier is trained on 789 914 gold standard examples of vessel and nonvessel pixels, then used for classifying previously unseen images. The algorithm was tested on the public digital retinal images for vessel extraction (DRIVE) set, frequently used in the literature and consisting of 40 manually labeled images with gold standard. Results were compared experimentally with those of eight algorithms as we…

Databases FactualComputer scienceFeature vectorFeature extractionNormal DistributionComputingMethodologies_IMAGEPROCESSINGANDCOMPUTERVISIONImage processingModels BiologicalEdge detectionArtificial IntelligenceImage Processing Computer-AssistedHumansSegmentationComputer visionAdaBoostFluorescein AngiographyElectrical and Electronic EngineeringTraining setPixelContextual image classificationSettore INF/01 - Informaticabusiness.industryReproducibility of ResultsRetinal VesselsWavelet transformBayes TheoremPattern recognitionGeneral MedicineImage segmentationComputer Science ApplicationsComputingMethodologies_PATTERNRECOGNITIONROC CurveTest setAdaBoost classifier retinal images vessel segmentationArtificial intelligencebusinessAlgorithmsBiotechnology
researchProduct