Search results for "Cluster Analysis"
showing 10 items of 848 documents
ValWorkBench: an open source Java library for cluster validation, with applications to microarray data analysis.
2015
Background: Cluster analysis is one of the most well known activities in scientific investigation and the object of research in many disciplines, ranging from statistics to computer science. It is central to the life sciences due to the advent of high throughput technologies, e.g., classification of tumors. In particular, in cluster analysis, it is of relevance to assess cluster quality and to predict the number of clusters in a dataset, if any. This latter task is usually performed via internal validation measures. Despite their potentially important role, both the use of classic internal validation measures and the design of new ones, specific for microarray data, do not seem to have grea…
Description, microhabitat selection and infection patterns of sealworm larvae (Pseudoterranova decipiens species complex, nematoda: ascaridoidea) in …
2013
Third-stage larvae of the Pseudoterranova decipiens species complex (also known as sealworms) have been reported in at least 40 marine fish species belonging to 21 families and 10 orders along the South American coast. Sealworms are a cause for concern because they can infect humans who consume raw or undercooked fish. However, despite their economic and zoonotic importance, morphological and molecular characterization of species of Pseudoterranova in South America is still scarce. Methods: A total of 542 individual fish from 20 species from the Patagonian coast of Argentina were examined for sealworms. The body cavity, the muscles, internal organs, and the mesenteries were examined to dete…
The age and evolution of sociality in Stegodyphus spiders: a molecular phylogenetic perspective
2006
Social, cooperative breeding behaviour is rare in spiders and generally characterized by inbreeding, skewed sex ratios and high rates of colony turnover, processes that when combined may reduce genetic variation and lower individual fitness quickly. On these grounds, social spider species have been suggested to be unstable in evolutionary time, and hence sociality a rare phenomenon in spiders. Based on a partial molecular phylogeny of the genus Stegodyphus , we address the hypothesis that social spiders in this genus are evolutionary transient. We estimate the age of the three social species, test whether they represent an ancestral or derived state and assess diversification relative to s…
Nondestructive Direct Determination of Heroin in Seized Illicit Street Drugs by Diffuse Reflectance near-Infrared Spectroscopy
2008
A new method has been developed for the fast and nondestructive direct determination of heroin in seized street illicit drugs using partial least-squares regression analysis of diffuse reflectance near-infrared spectra. Data were obtained from untreated samples placed in standard glass chromatography vials. A heterogeneous population of 31 samples, previously analyzed by a reference method, was employed to build the calibration model and to have a separated validation set. Based on the use of zero-order data for a calibration set of 21 samples, after standard normal variate and quadratic linear removed baseline correction (detrending), in the wavelength range from 1111 to 1647 nm, 8 PLS fac…
Lexical and sublexical units in speech perception.
2009
Saffran, Newport, and Aslin (1996a) found that human infants are sensitive to statistical regularities corresponding to lexical units when hearing an artificial spoken language. Two sorts of segmentation strategies have been proposed to account for this early word-segmentation ability: bracketing strategies, in which infants are assumed to insert boundaries into continuous speech, and clustering strategies, in which infants are assumed to group certain speech sequences together into units (Swingley, 2005). In the present study, we test the predictions of two computational models instantiating each of these strategies i.e., Serial Recurrent Networks: Elman, 1990; and Parser: Perruchet & Vint…
A branch-and-cut algorithm for the soft-clustered vehicle-routing problem
2021
Abstract The soft-clustered vehicle-routing problem is a variant of the classical capacitated vehicle-routing problem (CVRP) in which customers are partitioned into clusters and all customers of the same cluster must be served by the same vehicle. We introduce a novel symmetric formulation of the problem in which the clustering part is modeled with an asymmetric sub-model. We solve the new model with a branch-and-cut algorithm exploiting some known valid inequalities for the CVRP that can be adapted. In addition, we derive problem-specific cutting planes and new heuristic and exact separation procedures. For square grid instances in the Euclidean plane, we provide lower-bounding techniques …
Detection of spatial disease clusters with LISA functions.
2011
Detection of disease clusters is an important tool in epidemiology that can help to identify risk factors associated with the disease and in understanding its etiology. In this article we propose a method for the detection of spatial clusters where the locations of a set of cases and a set of controls are available. The method is based on local indicators of spatial association functions (LISA functions), particularly on the development of a local version of the product density, which is a second-order characteristic of spatial point processes. The behavior of the method is evaluated and compared with Kulldorff's spatial scan statistic by means of a simulation study. It is shown that the LI…
Using mathematical morphology for unsupervised classification of functional data
2011
This paper is concerned with the unsupervised classification of functional data by using mathematical morphology. Different morphological operators are used to extract relevant structures of the functions (considered as sets through their subgraph representations). These operators can be considered as preprocessing tools whose outputs are also functional data. We explore some dissimilarity measures and clustering methods for the classification of the transformed data. Our approach is illustrated through a detailed analysis of two data sets. These techniques, which have mainly been used in image processing, provide a flexible and robust toolbox for improving the results in unsupervised funct…
Cluster-Localized Sparse Logistic Regression for SNP Data
2012
The task of analyzing high-dimensional single nucleotide polymorphism (SNP) data in a case-control design using multivariable techniques has only recently been tackled. While many available approaches investigate only main effects in a high-dimensional setting, we propose a more flexible technique, cluster-localized regression (CLR), based on localized logistic regression models, that allows different SNPs to have an effect for different groups of individuals. Separate multivariable regression models are fitted for the different groups of individuals by incorporating weights into componentwise boosting, which provides simultaneous variable selection, hence sparse fits. For model fitting, th…
A fast and recursive algorithm for clustering large datasets with k-medians
2012
Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…