Search results for "Computer Science Applications"
showing 10 items of 3993 documents
UVPAR: fast detection of functional shifts in duplicate genes.
2006
Abstract Background The imprint of natural selection on gene sequences is often difficult to detect. A plethora of methods have been devised to detect genetic changes due to selective processes. However, many of those methods depend heavily on underlying assumptions regarding the mode of change of DNA sequences and often require sophisticated mathematical treatments that made them computationally slow. The development of fast and effective methods to detect modifications in the selective constraints of genes is therefore of great interest. Results We describe UVPAR, a program designed to quickly test for changes in the functional constraints of duplicate genes. Starting with alignments of t…
Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics
2019
Abstract Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the …
Global data on earthworm abundance, biomass, diversity and corresponding environmental properties
2021
Earthworms are an important soil taxon as ecosystem engineers, providing a variety of crucial ecosystem functions and services. Little is known about their diversity and distribution at large spatial scales, despite the availability of considerable amounts of local-scale data. Earthworm diversity data, obtained from the primary literature or provided directly by authors, were collated with information on site locations, including coordinates, habitat cover, and soil properties. Datasets were required, at a minimum, to include abundance or biomass of earthworms at a site. Where possible, site-level species lists were included, as well as the abundance and biomass of individual species and ec…
Controlling false match rates in record linkage using extreme value theory
2011
AbstractCleansing data from synonyms and homonyms is a relevant task in fields where high quality of data is crucial, for example in disease registries and medical research networks. Record linkage provides methods for minimizing synonym and homonym errors thereby improving data quality. We focus our attention to the case of homonym errors (in the following denoted as ‘false matches’), in which records belonging to different entities are wrongly classified as equal. Synonym errors (‘false non-matches’) occur when a single entity maps to multiple records in the linkage result. They are not considered in this study because in our application domain they are not as crucial as false matches. Fa…
Social Media Monitoring for Crisis Communication: Process, Methods and Trends in the Scientific Literature
2014
This literature review study aims at clarifying current knowledge on social media monitoring from the perspective of organizational communication and public relations. It also contributes to crisis communication by shedding light on how fast developing social media discourse can be followed and analysed in order to understand citizens’ needs throughout all the phases of a crisis. The findings of this study reveal a number of insights in the scientific literature on the concept of monitoring, the monitoring process, methods, tools and solutions, methodological issues and trends covering the years 2009–2012. In the literature, social media monitoring is described as a process which comprises …
Optical remote sensing and the retrieval of terrestrial vegetation bio-geophysical properties – A review
2015
Abstract: Forthcoming superspectral satellite missions dedicated to land monitoring, as well as planned imaging spectrometers, will unleash an unprecedented data stream. The processing requirements for such large data streams involve processing techniques enabling the spatio-temporally explicit quantification of vegetation properties. Typically retrieval must be accurate, robust and fast. Hence, there is a strict requirement to identify next-generation bio-geophysical variable retrieval algorithms which can be molded into an operational processing chain. This paper offers a review of state-of-the-art retrieval methods for quantitative terrestrial bio-geophysical variable extraction using op…
Guiding the modeller: organizing and selecting experimental data for single cell models using the CoCoDat database
2003
Collating, organizing and selecting quantitative experimental data are time-consuming tasks necessary for building and constraining biophysically realistic neuronal models. The CoCoDat (Collation of Cortical Data) database has been designed as an advanced environment for storing, organizing and retrieving detailed, uninterpreted quantitative data on morphology, electrophysiology and connectivity from the published literature according to neurophysiological concepts. All experimental data are linked to exact bibliographical references and detailed records of procedures used in the experiments that produced the data. We demonstrate the usefulness of CoCoDat for implementation of an example mo…
PyCellBase, an efficient python package for easy retrieval of biological data from heterogeneous sources.
2019
Background Biological databases and repositories are incrementing in diversity and complexity over the years. This rapid expansion of current and new sources of biological knowledge raises serious problems of data accessibility and integration. To handle the growing necessity of unification, CellBase was created as an integrative solution. CellBase provides a centralized NoSQL database containing biological information from different and heterogeneous sources. Access to this information is done through a RESTful web service API, which provides an efficient interface to the data. Results In this work we present PyCellBase, a Python package that provides programmatic access to the rich RESTfu…
Local dimensionality reduction and supervised learning within natural clusters for biomedical data analysis
2006
Inductive learning systems were successfully applied in a number of medical domains. Nevertheless, the effective use of these systems often requires data preprocessing before applying a learning algorithm. This is especially important for multidimensional heterogeneous data presented by a large number of features of different types. Dimensionality reduction (DR) is one commonly applied approach. The goal of this paper is to study the impact of natural clustering--clustering according to expert domain knowledge--on DR for supervised learning (SL) in the area of antibiotic resistance. We compare several data-mining strategies that apply DR by means of feature extraction or feature selection w…
FABC: Retinal Vessel Segmentation Using AdaBoost
2010
This paper presents a method for automated vessel segmentation in retinal images. For each pixel in the field of view of the image, a 41-D feature vector is constructed, encoding information on the local intensity structure, spatial properties, and geometry at multiple scales. An AdaBoost classifier is trained on 789 914 gold standard examples of vessel and nonvessel pixels, then used for classifying previously unseen images. The algorithm was tested on the public digital retinal images for vessel extraction (DRIVE) set, frequently used in the literature and consisting of 40 manually labeled images with gold standard. Results were compared experimentally with those of eight algorithms as we…