Search results for "Medoid"
showing 8 items of 8 documents
Non-parametric approaches to the impact of Holstein heifer growth from birth to insemination on their dairy performance at lactation one
2012
SUMMARYParametric approaches have been used widely to model animal growth and study the impact of growth profile on performance. Individual variation is often not considered in such approaches. However, non-parametric modelling allows this. Such an approach, based on spline functions, was used to study the importance of growth profiles from age 0 to 15 months (i.e. insemination) on milk yield and composition in primiparous cows. A dataset of 447 heifers was used for analysis of growth performance; 296 of them were also used to study impact on lactation. All of them originated from a French experimental herd and were born between 1986 and 2006. Clustering methods were also tested. Comparison…
Structural clustering of millions of molecular graphs
2014
We propose an algorithm for clustering very large molecular graph databases according to scaffolds (i.e., large structural overlaps) that are common between cluster members. Our approach first partitions the original dataset into several smaller datasets using a greedy clustering approach named APreClus based on dynamic seed clustering. APreClus is an online and instance incremental clustering algorithm delaying the final cluster assignment of an instance until one of the so-called pending clusters the instance belongs to has reached significant size and is converted to a fixed cluster. Once a cluster is fixed, APreClus recalculates the cluster centers, which are used as representatives for…
Incrementally Assessing Cluster Tendencies with a~Maximum Variance Cluster Algorithm
2003
A straightforward and efficient way to discover clustering tendencies in data using a recently proposed Maximum Variance Clustering algorithm is proposed. The approach shares the benefits of the plain clustering algorithm with regard to other approaches for clustering. Experiments using both synthetic and real data have been performed in order to evaluate the differences between the proposed methodology and the plain use of the Maximum Variance algorithm. According to the results obtained, the proposal constitutes an efficient and accurate alternative.
Looking for representative fit models for apparel sizing
2014
This paper is concerned with the generation of optimal fit models for use in apparel design. Representative fit models or prototypes are important for defining a meaningful sizing system. However, there is no agreement among apparel manufacturers and each one has their own prototypes and size charts i.e. there is a lack of standard sizes in garments from different apparel manufacturers. We propose two algorithms based on a new hierarchical partitioning around medoids clustering method originally developed for gene expression data. We are concerned with a different application; therefore, the dissimilarity between the objects has to be different and must be designed to deal with anthropometr…
Apparel sizing using trimmed PAM and OWA operators
2012
This paper is concerned with apparel sizing system design. One of the most important issues in the apparel development process is to define a sizing system that provides a good fit to the majority of the population. A sizing system classifies a specific population into homogeneous subgroups based on some key body dimensions. Standard sizing systems range linearly from very small to very large. However, anthropometric measures do not grow linearly with size, so they can not accommodate all body types. It is important to determine each class in the sizing system based on a real prototype that is as representative as possible of each class. In this paper we propose a methodology to develop an …
A fast and recursive algorithm for clustering large datasets with k-medians
2012
Clustering with fast algorithms large samples of high dimensional data is an important challenge in computational statistics. Borrowing ideas from MacQueen (1967) who introduced a sequential version of the $k$-means algorithm, a new class of recursive stochastic gradient algorithms designed for the $k$-medians loss criterion is proposed. By their recursive nature, these algorithms are very fast and are well adapted to deal with large samples of data that are allowed to arrive sequentially. It is proved that the stochastic gradient algorithm converges almost surely to the set of stationary points of the underlying loss criterion. A particular attention is paid to the averaged versions, which…
Tiešsaistes Klientu Segmentācija, Izmantojot Klasterizācijas Metodes
2021
Bakalaura darba mērķis ir izpētīt tiešsaistes klientu segmentāciju, lai tā palīdzētu pieņemt loģiskus lēmumus par efektīvu mārketinga un reklāmas resursu izmantošanu. Darbā tika izmantotas divas klasterizācijas metodes: K-Medoīdu (K-Medoids) klasterizācija, un K-Prototipu (K-Prototypes) klasterizācija. Metožu izvēle tiek pamatota ar pētītā uzdevuma raksturojumu. Darba gaitā tiek aprakstīti gan abu metožu teorētiskie aspekti, gan metodes tiek pielietotas praktiski (izmantojot programmu R) konkrēta uzdevuma risināšanai. Tika veikta iegūto rezultātu analīze un salīdzināšana. Bakalaura darbā tika arī paskaidrota klientu segmentācijas nozīme veiksmīgam uzņēmumam, kā arī tika aprakstīts interneta…
SparseHC: A Memory-efficient Online Hierarchical Clustering Algorithm
2014
Computing a hierarchical clustering of objects from a pairwise distance matrix is an important algorithmic kernel in computational science. Since the storage of this matrix requires quadratic space with respect to the number of objects, the design of memory-efficient approaches is of high importance to this research area. In this paper, we address this problem by presenting a memory-efficient online hierarchical clustering algorithm called SparseHC. SparseHC scans a sorted and possibly sparse distance matrix chunk-by-chunk. Meanwhile, a dendrogram is built by merging cluster pairs as and when the distance between them is determined to be the smallest among all remaining cluster pairs. The k…