Search results for "cluster analysis."
showing 10 items of 805 documents
On the complexity of the Saccharomyces bayanus taxon: Hybridization and potential hybrid speciation
2014
Although the genus Saccharomyces has been thoroughly studied, some species in the genus has not yet been accurately resolved; an example is S. bayanus, a taxon that includes genetically diverse lineages of pure and hybrid strains. This diversity makes the assignation and classification of strains belonging to this species unclear and controversial. They have been subdivided by some authors into two varieties (bayanus and uvarum), which have been raised to the species level by others. In this work, we evaluate the complexity of 46 different strains included in the S. bayanus taxon by means of PCR-RFLP analysis and by sequencing of 34 gene regions and one mitochondrial gene. Using the sequenc…
Deep-Time Phylogenetic Clustering of Extinctions in an Evolutionarily Dynamic Clade (Early Jurassic Ammonites)
2012
7 pages; International audience; Conservation biologists and palaeontologists are increasingly investigating the phylogenetic distribution of extinctions and its evolutionary consequences. However, the dearth of palaeontological studies on that subject and the lack of methodological consensus hamper our understanding of that major evolutionary phenomenon. Here we address this issue by (i) reviewing the approaches used to quantify the phylogenetic selectivity of extinctions and extinction risks; (ii) investigating with a high-resolution dataset whether extinctions and survivals were phylogenetically clustered among early Pliensbachian (Early Jurassic) ammonites; (iii) exploring the phylogene…
Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals
2000
The study of a few genes has permitted the identification of three elements that constitute a yeast polyadenylation signal: the efficiency element (EE), the positioning element and the actual site for cleavage and polyadenylation. In this paper we perform an analysis of oligonucleotide composition on the sequences located downstream of the stop codon of all yeast genes. Several oligonucleotide families appear over-represented with a high significance (referred to herein as"words"). The family with the highest over-representation includes the oligonucleotides shown experimentally to play a role as EEs. The word with the highest score is TATATA, followed, among others, by a series of singl…
Distributed and proximity-constrained C-means for discrete coverage control
2018
In this paper we present a novel distributed coverage control framework for a network of mobile agents, in charge of covering a finite set of points of interest (PoI), such as people in danger, geographically dispersed equipment or environmental landmarks. The proposed algorithm is inspired by C-Means, an unsupervised learning algorithm originally proposed for non-exclusive clustering and for identification of cluster centroids from a set of observations. To cope with the agents' limited sensing range and avoid infeasible coverage solutions, traditional C-Means needs to be enhanced with proximity constraints, ensuring that each agent takes into account only neighboring PoIs. The proposed co…
Multilingual Clustering of Streaming News
2018
Clustering news across languages enables efficient media monitoring by aggregating articles from multilingual sources into coherent stories. Doing so in an online setting allows scalable processing of massive news streams. To this end, we describe a novel method for clustering an incoming stream of multilingual documents into monolingual and crosslingual story clusters. Unlike typical clustering approaches that consider a small and known number of labels, we tackle the problem of discovering an ever growing number of cluster labels in an online fashion, using real news datasets in multiple languages. Our method is simple to implement, computationally efficient and produces state-of-the-art …
Diffusion map for clustering fMRI spatial maps extracted by Indipendent Component Analysis
2013
Functional magnetic resonance imaging (fMRI) produces data about activity inside the brain, from which spatial maps can be extracted by independent component analysis (ICA). In datasets, there are n spatial maps that contain p voxels. The number of voxels is very high compared to the number of analyzed spatial maps. Clustering of the spatial maps is usually based on correlation matrices. This usually works well, although such a similarity matrix inherently can explain only a certain amount of the total variance contained in the high-dimensional data where n is relatively small but p is large. For high-dimensional space, it is reasonable to perform dimensionality reduction before clustering.…
Heretical Mutiple Importance Sampling
2016
Multiple Importance Sampling (MIS) methods approximate moments of complicated distributions by drawing samples from a set of proposal distributions. Several ways to compute the importance weights assigned to each sample have been recently proposed, with the so-called deterministic mixture (DM) weights providing the best performance in terms of variance, at the expense of an increase in the computational cost. A recent work has shown that it is possible to achieve a trade-off between variance reduction and computational effort by performing an a priori random clustering of the proposals (partial DM algorithm). In this paper, we propose a novel "heretical" MIS framework, where the clustering …
Mixture Hidden Markov Models for Sequence Data: The seqHMM Package in R
2019
Sequence analysis is being more and more widely used for the analysis of social sequences and other multivariate categorical time series data. However, it is often complex to describe, visualize, and compare large sequence data, especially when there are multiple parallel sequences per subject. Hidden (latent) Markov models (HMMs) are able to detect underlying latent structures and they can be used in various longitudinal settings: to account for measurement error, to detect unobservable states, or to compress information across several types of observations. Extending to mixture hidden Markov models (MHMMs) allows clustering data into homogeneous subsets, with or without external covariate…
A multi-scale area-interaction model for spatio-temporal point patterns
2018
Models for fitting spatio-temporal point processes should incorporate spatio-temporal inhomogeneity and allow for different types of interaction between points (clustering or regularity). This paper proposes an extension of the spatial multi-scale area-interaction model to a spatio-temporal framework. This model allows for interaction between points at different spatio-temporal scales and the inclusion of covariates. We fit the proposed model to varicella cases registered during 2013 in Valencia, Spain. The fitted model indicates small scale clustering and regularity for higher spatio-temporal scales.
Fast PET Scan Tumor Segmentation Using Superpixels, Principal Component Analysis and K-Means Clustering
2018
Positron Emission Tomography scan images are extensively used in radiotherapy planning, clinical diagnosis, assessment of growth and treatment of a tumor. These all rely on fidelity and speed of detection and delineation algorithm. Despite intensive research, segmentation remained a challenging problem due to the diverse image content, resolution, shape, and noise. This paper presents a fast positron emission tomography tumor segmentation method in which superpixels are extracted first from the input image. Principal component analysis is then applied on the superpixels and also on their average. Distance vector of each superpixel from the average is computed in principal components coordin…