Improvements and applications of the elements of prototype-based clustering

6533b88bfe1ef96bd12e2a35

RESEARCH PRODUCT

Improvements and applications of the elements of prototype-based clustering

subject

random projection parallel computing knowledge discovery clustering initialization minimal learning machine data mining prototype-based clustering machine learning koneoppiminen big data rinnakkaiskäsittely klusterianalyysi tiedonlouhinta robust clustering K-means

description

Clustering or cluster analysis is an essential part of data mining, machine learning, and pattern recognition. The most popularly applied clustering methods are partitioning-based or prototype-based methods. Prototype-based clustering methods usually have easy implementability and good scalability. These methods, such as K-means clustering, have been used for different applications in various ﬁelds. On the other hand, prototype-based clustering methods are typically sensitive to initialization, and the selection of the number of clusters for knowledge discovery purposes is not straightforward. In the era of big data, in high-velocity, ever-growing datasets, which can also be erroneous, outlier intensive and sparse, research has arisen focused on the development of efﬁcient prototype-based clustering methods for more challenging datasets. This collection of articles primarily focuses on developing prototype-based clustering for more scalable, efﬁcient and reliable data processing. To achieve these goals, improvements and modiﬁcations have been made to prototype-based clustering in six included articles. Additionally an application of the prototype-based clustering to supervised learning in regression problems is also covered. In general, these efforts advance the knowledge discovery process towards more reliable data processing and big data. Klusterointi eli klusterianalyysi on keskeinen osa-alue tiedonlouhinnassa, koneoppimisessa ja hahmontunnistuksessa. Sovelluksissa käytetyimpiä ovat osittavat eli prototyyppipohjaiset klusterointimenetelmät. Prototyyppipohjaiset klusterointimenetelmät ovat usein helposti toteutettavissa ja ne skaalautuvat hyvin. Näitä menetelmiä, kuten K-means-klusterointia, on hyödynnetty monissa eri sovelluksissa eri tutkimusaloilla. Toisaalta prototyyppipohjaiset klusterointimenetelmät ovat alustukselle herkkiä eikä klustereiden lukumäärän valinta ole suoraviivaista. Big datan aikakaudella nopeasti kasvavat tietomassat, jotka voivat myös olla virheellisiä, anomaliaintensiivisiä ja harvoja, ohjaavat tutkimusta tehokkaiden prototyyppipohjaisten klusterointimenetelmien kehittämiseen haastaville datajoukoille. Tämä artikkeliväitöskirja keskittyy pääasiassa kehittämään datan prosessointia prototyyppipohjaisella klusteroinnilla skaalautuvammaksi, tehokkaammaksi ja luotettavammaksi. Näiden tavoitteiden saavuttamiseksi kuudessa väitöskirjaan kuuluvassa artikkelissa on tehty parannuksia ja modiﬁkaatioita prototyyppipohjaiseen klusterointiin. Lisäksi prototyyppipohjaisen klusteroinnin sovellusta ohjattuun oppimiseen regressio-ongelmissa on käsitelty yhdessä artikkelissa. Yleisesti väitöskirjan tulokset kehittävät tietämyksen muodostamisprosessia kohti luotettavampaa datan prosessointia ja skaalautuvampaa big datan prosessointia.

year	journal	country	edition	language
2018-01-01

http://urn.fi/URN:ISBN:978-951-39-7621-7