Search results for "algorithm"
showing 10 items of 4887 documents
Machine learning at the interface of structural health monitoring and non-destructive evaluation
2020
While both non-destructive evaluation (NDE) and structural health monitoring (SHM) share the objective of damage detection and identification in structures, they are distinct in many respects. This paper will discuss the differences and commonalities and consider ultrasonic/guided-wave inspection as a technology at the interface of the two methodologies. It will discuss how data-based/machine learning analysis provides a powerful approach to ultrasonic NDE/SHM in terms of the available algorithms, and more generally, how different techniques can accommodate the very substantial quantities of data that are provided by modern monitoring campaigns. Several machine learning methods will be illu…
UVPAR: fast detection of functional shifts in duplicate genes.
2006
Abstract Background The imprint of natural selection on gene sequences is often difficult to detect. A plethora of methods have been devised to detect genetic changes due to selective processes. However, many of those methods depend heavily on underlying assumptions regarding the mode of change of DNA sequences and often require sophisticated mathematical treatments that made them computationally slow. The development of fast and effective methods to detect modifications in the selective constraints of genes is therefore of great interest. Results We describe UVPAR, a program designed to quickly test for changes in the functional constraints of duplicate genes. Starting with alignments of t…
gcType : a high-quality type strain genome database for microbial phylogenetic and functional research
2020
Abstract Taxonomic and functional research of microorganisms has increasingly relied upon genome-based data and methods. As the depository of the Global Catalogue of Microorganisms (GCM) 10K prokaryotic type strain sequencing project, Global Catalogue of Type Strain (gcType) has published 1049 type strain genomes sequenced by the GCM 10K project which are preserved in global culture collections with a valid published status. Additionally, the information provided through gcType includes >12 000 publicly available type strain genome sequences from GenBank incorporated using quality control criteria and standard data annotation pipelines to form a high-quality reference database. This …
Criminal networks analysis in missing data scenarios through graph distances.
2021
Data collected in criminal investigations may suffer from: (i) incompleteness, due to the covert nature of criminal organisations; (ii) incorrectness, caused by either unintentional data collection errors and intentional deception by criminals; (iii) inconsistency, when the same information is collected into law enforcement databases multiple times, or in different formats. In this paper we analyse nine real criminal networks of different nature (i.e., Mafia networks, criminal street gangs and terrorist organizations) in order to quantify the impact of incomplete data and to determine which network type is most affected by it. The networks are firstly pruned following two specific methods: …
Analyzing big datasets of genomic sequences: fast and scalable collection of k-mer statistics
2019
Abstract Background Distributed approaches based on the MapReduce programming paradigm have started to be proposed in the Bioinformatics domain, due to the large amount of data produced by the next-generation sequencing techniques. However, the use of MapReduce and related Big Data technologies and frameworks (e.g., Apache Hadoop and Spark) does not necessarily produce satisfactory results, in terms of both efficiency and effectiveness. We discuss how the development of distributed and Big Data management technologies has affected the analysis of large datasets of biological sequences. Moreover, we show how the choice of different parameter configurations and the careful engineering of the …
Data Augmentation Approach in Bayesian Modelling of Presence-only Data
2011
Abstract Ecologists are interested in prediction of potential distribution of species in suitable areas, essential for planning conservation and management strategies. Unfortunately, often the only available information in such studies is the true presence of the species at few locations of the study area and the associated environmental covariates over the entire area, referred as presence-only data. We propose a Bayesian approach to estimate logistic linear regressions adapted to presence-only data through the introduction of a random approximation of the correction factor in the adjusted logistic model that allows us to overcome the need to know a priori the prevalence of the species.
Controlling false match rates in record linkage using extreme value theory
2011
AbstractCleansing data from synonyms and homonyms is a relevant task in fields where high quality of data is crucial, for example in disease registries and medical research networks. Record linkage provides methods for minimizing synonym and homonym errors thereby improving data quality. We focus our attention to the case of homonym errors (in the following denoted as ‘false matches’), in which records belonging to different entities are wrongly classified as equal. Synonym errors (‘false non-matches’) occur when a single entity maps to multiple records in the linkage result. They are not considered in this study because in our application domain they are not as crucial as false matches. Fa…
Regression diagnostics applied in kinetic data processing: Outlier recognition and robust weighting procedures
2010
An efficient protocol, based on advanced statistical diagnostics and robust fitting techniques applied to the least-squares processing of kinetic data of chemical reactions, is presented and discussed. The procedure, which is aimed at obtaining highly accurate estimation of the fitting parameters, consists of the identification of the outliers that remarkably impair the fitting by means of the so-called “leverage analysis” and some related diagnostics. This approach allows the elimination of the actually aberrant observations from the data set and/or their robust weighting to inhibit the negative effects induced on the data fitting, with consequent reduction of the bias introduced into the …
A new fast and fault-tolerant identification algorithm for spectral databases
1995
A new method for an automatic, computer and database driven identification of UV/VIS spectra is described. It is shown that an identification algorithm must consider the spectral differences as well as their common features. The described identification method allows identifications, even if the spectra are distorted or shifted.
Climate Data Records of Vegetation Variables from Geostationary SEVIRI/MSG Data: Products, Algorithms and Applications
2019
The scientific community requires long-term data records with well-characterized uncertainty and suitable for modeling terrestrial ecosystems and energy cycles at regional and global scales. This paper presents the methodology currently developed in EUMETSAT within its Satellite Application Facility for Land Surface Analysis (LSA SAF) to generate biophysical variables from the Spinning Enhanced Visible and InfraRed Imager (SEVIRI) on board MSG 1-4 (Meteosat 8-11) geostationary satellites. Using this methodology, the LSA SAF generates and disseminates at a time a suite of vegetation products, such as the leaf area index (LAI), the fraction of the photosynthetically active radiation absorbed …