Author: Joonas Hämäläinen

0000000000314107

AUTHOR

Joonas Hämäläinen

Problem Transformation Methods with Distance-Based Learning for Multi-Target Regression

Multi-target regression is a special subset of supervised machine learning problems. Problem transformation methods are used in the field to improve the performance of basic methods. The purpose of this article is to test the use of recently popularized distance-based methods, the minimal learning machine (MLM) and the extreme minimal learning machine (EMLM), in problem transformation. The main advantage of the full data variants of these methods is the lack of any meta-parameter. The experimental results for the MLM and EMLM show promising potential, emphasizing the utility of the problem transformation especially with the EMLM. peerReviewed

research product

Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine?

Minimal Learning Machine (MLM) is a recently popularized supervised learning method, which is composed of distance-regression and multilateration steps. The computational complexity of MLM is dominated by the solution of an ordinary least-squares problem. Several different solvers can be applied to the resulting linear problem. In this paper, a thorough comparison of possible and recently proposed, especially randomized, algorithms is carried out for this problem with a representative set of regression datasets. In addition, we compare MLM with shallow and deep feedforward neural network models and study the effects of the number of observations and the number of features with a special dat…

research product

A method for structure prediction of metal-ligand interfaces of hybrid nanoparticles

Hybrid metal nanoparticles, consisting of a nano-crystalline metal core and a protecting shell of organic ligand molecules, have applications in diverse areas such as biolabeling, catalysis, nanomedicine, and solar energy. Despite a rapidly growing database of experimentally determined atom-precise nanoparticle structures and their properties, there has been no successful, systematic way to predict the atomistic structure of the metal-ligand interface. Here, we devise and validate a general method to predict the structure of the metal-ligand interface of ligand-stabilized gold and silver nanoparticles, based on information about local chemical environments of atoms in experimental data. In …

research product

Sädehoidon annossuunnitelmien poikkeavuuksien havaitseminen neuroverkoilla

Sädehoidossa potilaalle tehdään yksilöllinen annossuunnitelma, jonka mukaan hoito toteutetaan. Kaikilta annokseen vaikuttavilta tekijöiltä vaaditaan suurta tarkkuutta. Uusi lähestymistapa annossuunnitelmien laadunvarmistukseen on tiedonlouhintaan ja koneoppimiseen perustuvien menetelmien hyödyntäminen. Kyseisillä menetelmillä voidaan muodostaa hoidossa aiemmin toteutetuista annossuunnitelmista malli, jonka avulla voidaan havaita uusien annossuunnitelmien poikkeavuudet, ja näin lisätä sädehoidon turvallisuutta. Tutkimuksen tavoitteena oli muodostaa SOM- ja PNN-neuroverkoilla malli, jolla voidaan havaita poikkeavuuksia annossuunnitelmista. Mallia varten haettiin rinnanpoiston jälkeisten ko…

research product

Monte Carlo Simulations of Au38(SCH3)24 Nanocluster Using Distance-Based Machine Learning Methods

We present an implementation of distance-based machine learning (ML) methods to create a realistic atomistic interaction potential to be used in Monte Carlo simulations of thermal dynamics of thiol...

research product

Feature Ranking of Large, Robust, and Weighted Clustering Result

A clustering result needs to be interpreted and evaluated for knowledge discovery. When clustered data represents a sample from a population with known sample-to-population alignment weights, both the clustering and the evaluation techniques need to take this into account. The purpose of this article is to advance the automatic knowledge discovery from a robust clustering result on the population level. For this purpose, we derive a novel ranking method by generalizing the computation of the Kruskal-Wallis H test statistic from sample to population level with two different approaches. Application of these enlargements to both the input variables used in clustering and to metadata provides a…

research product

Newton Method for Minimal Learning Machine

Minimal Learning Machine (MLM) is a distance-based supervised machine learning method for classification and regression problems. Its main advances are simple formulation and fast learning. Computing the MLM prediction in regression requires a solution to the optimization problem, which is determined by the input and output distance matrix mappings. In this paper, we propose to use the Newton method for solving this optimization problem in multi-output regression and compare the performance of this algorithm with the most popular Levenberg–Marquardt method. According to our knowledge, MLM has not been previously studied in the context of multi-output regression in the literature. In additio…

research product

Instance-Based Multi-Label Classification via Multi-Target Distance Regression

Interest in multi-target regression and multi-label classification techniques and their applications have been increasing lately. Here, we use the distance-based supervised method, minimal learning machine (MLM), as a base model for multi-label classification. We also propose and test a hybridization of unsupervised and supervised techniques, where prototype-based clustering is used to reduce both the training time and the overall model complexity. In computational experiments, competitive or improved quality of the obtained models compared to the state-of-the-art techniques was observed. peerReviewed

research product

Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering

Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on t…

research product

Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection

The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated using a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM's generaliz…

research product

Orientation Adaptive Minimal Learning Machine for Directions of Atomic Forces

Machine learning (ML) force fields are one of the most common applications of ML in nanoscience. However, commonly these methods are trained on potential energies of atomic systems and force vectors are omitted. Here we present a ML framework, which tackles the greatest difficulty on using forces in ML: accurate prediction of force direction. We use the idea of Minimal Learning Machine to device a method which can adapt to the orientation of an atomic environment to estimate the directions of force vectors. The method was tested with linear alkane molecules. peerReviewed

research product

Improving Scalable K-Means++

Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation …

research product

Scalable robust clustering method for large and sparse data

Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. Moreprecisely, a general algorithm is described and the accuracy and scalability of a distributed implementation of the algorithm is tested. The obtained results allow us to conclude the viability of the proposed approach. peerReviewed

research product

Feature selection for distance-based regression: An umbrella review and a one-shot wrapper

Feature selection (FS) may improve the performance, cost-efficiency, and understandability of supervised machine learning models. In this paper, FS for the recently introduced distance-based supervised machine learning model is considered for regression problems. The study is contextualized by first providing an umbrella review (review of reviews) of recent development in the research field. We then propose a saliency-based one-shot wrapper algorithm for FS, which is called MAS-FS. The algorithm is compared with a set of other popular FS algorithms, using a versatile set of simulated and benchmark datasets. Finally, experimental results underline the usefulness of FS for regression, confirm…

research product

Au38Q MBTR-K3

Purpose The purpose of Au38Q MBTR-K3 is to test the scalability of a machine learning regression model when the number of observations and the number of features change. Background The Au38Q MBTR-K3 was created from a trajectory file regarding the density functional theory simulation of Au38Q hybrid nanoparticle performed by Juarez-Mosqueda et al. in their paper Ab initio molecular dynamics studies of Au38(SR)24 isomers under heating using the MBTR descriptor by Himanen et al. as presented in paper DScribe: Library of descriptors for machine learning in materials science. The MBTR was used with the default parameters for K=3 (angles between atoms) presented at the website of Dscribe version…

research product

Au38Q MBTR-K3

Purpose The purpose of Au38Q MBTR-K3 is to test the scalability of a machine learning regression model when the number of observations and the number of features change. Background The Au38Q MBTR-K3 was created from a trajectory file regarding the density functional theory simulation of Au38Q hybrid nanoparticle performed by Juarez-Mosqueda et al. in their paper Ab initio molecular dynamics studies of Au38(SR)24 isomers under heating using the MBTR descriptor by Himanen et al. as presented in paper DScribe: Library of descriptors for machine learning in materials science. The MBTR was used with the default parameters for K=3 (angles between atoms) presented at the website of Dscribe vers…

research product