Author: Joonas Hämäläinen

0000000001299397

AUTHOR

Joonas Hämäläinen

showing 16 related works from this author

Problem Transformation Methods with Distance-Based Learning for Multi-Target Regression

2020

Multi-target regression is a special subset of supervised machine learning problems. Problem transformation methods are used in the field to improve the performance of basic methods. The purpose of this article is to test the use of recently popularized distance-based methods, the minimal learning machine (MLM) and the extreme minimal learning machine (EMLM), in problem transformation. The main advantage of the full data variants of these methods is the lack of any meta-parameter. The experimental results for the MLM and EMLM show promising potential, emphasizing the utility of the problem transformation especially with the EMLM. peerReviewed

the minimal learning machine (MLM) and the extreme minimal learning machine (EMLM)koneoppiminenemphasizing the utility of the problem transformation especially with the EMLM.Multi-target regression is a special subset of supervised machine learning problems. Problem transformation methods are used in the field to improve the performance of basic methods. The purpose of this article is to test the use of recently popularized distance-based methodsin problem transformation. The main advantage of the full data variants of these methods is the lack of any meta-parameter. The experimental results for the MLM and EMLM show promising potential

researchProduct

Do Randomized Algorithms Improve the Efficiency of Minimal Learning Machine?

2020

Minimal Learning Machine (MLM) is a recently popularized supervised learning method, which is composed of distance-regression and multilateration steps. The computational complexity of MLM is dominated by the solution of an ordinary least-squares problem. Several different solvers can be applied to the resulting linear problem. In this paper, a thorough comparison of possible and recently proposed, especially randomized, algorithms is carried out for this problem with a representative set of regression datasets. In addition, we compare MLM with shallow and deep feedforward neural network models and study the effects of the number of observations and the number of features with a special dat…

0209 industrial biotechnologyrandom projectionlcsh:Computer engineering. Computer hardwareComputational complexity theoryComputer scienceRandom projectionlcsh:TK7885-789502 engineering and technologyMachine learningcomputer.software_genresupervised learningapproximate algorithmsSet (abstract data type)regressioanalyysi020901 industrial engineering & automationdistance–based regressionalgoritmit0202 electrical engineering electronic engineering information engineeringordinary least–squaresbusiness.industrySupervised learningsingular value decompositionminimal learning machineMultilaterationprojektioRandomized algorithmkoneoppiminenmachine learningScalabilityFeedforward neural network020201 artificial intelligence & image processingArtificial intelligenceapproksimointibusinesscomputerMachine Learning and Knowledge Extraction

researchProduct

A method for structure prediction of metal-ligand interfaces of hybrid nanoparticles

2019

Hybrid metal nanoparticles, consisting of a nano-crystalline metal core and a protecting shell of organic ligand molecules, have applications in diverse areas such as biolabeling, catalysis, nanomedicine, and solar energy. Despite a rapidly growing database of experimentally determined atom-precise nanoparticle structures and their properties, there has been no successful, systematic way to predict the atomistic structure of the metal-ligand interface. Here, we devise and validate a general method to predict the structure of the metal-ligand interface of ligand-stabilized gold and silver nanoparticles, based on information about local chemical environments of atoms in experimental data. In …

0301 basic medicineSteric effectsMaterials scienceInterface (Java)ScienceGeneral Physics and AstronomyNanoparticleNanotechnology02 engineering and technologyArticleGeneral Biochemistry Genetics and Molecular BiologySilver nanoparticleNanomaterials03 medical and health sciencesMoleculelcsh:ScienceMultidisciplinaryLigandQliganditGeneral Chemistrylaskennallinen kemia021001 nanoscience & nanotechnology030104 developmental biologyNanoparticlesAtomistic modelsNanomedicinelcsh:QMaterials chemistrynanohiukkaset0210 nano-technologyNature Communications

researchProduct

Sädehoidon annossuunnitelmien poikkeavuuksien havaitseminen neuroverkoilla

2013

Sädehoidossa potilaalle tehdään yksilöllinen annossuunnitelma, jonka mukaan hoito toteutetaan. Kaikilta annokseen vaikuttavilta tekijöiltä vaaditaan suurta tarkkuutta. Uusi lähestymistapa annossuunnitelmien laadunvarmistukseen on tiedonlouhintaan ja koneoppimiseen perustuvien menetelmien hyödyntäminen. Kyseisillä menetelmillä voidaan muodostaa hoidossa aiemmin toteutetuista annossuunnitelmista malli, jonka avulla voidaan havaita uusien annossuunnitelmien poikkeavuudet, ja näin lisätä sädehoidon turvallisuutta. Tutkimuksen tavoitteena oli muodostaa SOM- ja PNN-neuroverkoilla malli, jolla voidaan havaita poikkeavuuksia annossuunnitelmista. Mallia varten haettiin rinnanpoiston jälkeisten ko…

sädehoitoAnnossuunnittelupoikkeavuuksien havaitseminenneuroverkot

researchProduct

Monte Carlo Simulations of Au38(SCH3)24 Nanocluster Using Distance-Based Machine Learning Methods

2020

We present an implementation of distance-based machine learning (ML) methods to create a realistic atomistic interaction potential to be used in Monte Carlo simulations of thermal dynamics of thiol...

010304 chemical physicsbusiness.industryChemistryMonte Carlo methodThermal dynamics010402 general chemistryMachine learningcomputer.software_genre01 natural sciences0104 chemical sciencesInteraction potential0103 physical sciencesCluster (physics)Artificial intelligencePhysical and Theoretical ChemistrybusinesscomputerDistance basedThe Journal of Physical Chemistry A

researchProduct

Feature Ranking of Large, Robust, and Weighted Clustering Result

2017

A clustering result needs to be interpreted and evaluated for knowledge discovery. When clustered data represents a sample from a population with known sample-to-population alignment weights, both the clustering and the evaluation techniques need to take this into account. The purpose of this article is to advance the automatic knowledge discovery from a robust clustering result on the population level. For this purpose, we derive a novel ranking method by generalizing the computation of the Kruskal-Wallis H test statistic from sample to population level with two different approaches. Application of these enlargements to both the input variables used in clustering and to metadata provides a…

Kruskal-Wallis testComputer scienceCorrelation clusteringPopulation02 engineering and technologycomputer.software_genreMachine learning01 natural sciencesRanking (information retrieval)010104 statistics & probabilityKnowledge extractionCURE data clustering algorithmpopulation analysisRanking SVM0202 electrical engineering electronic engineering information engineeringTest statistic0101 mathematicseducational knowledge discoveryeducationCluster analysiseducation.field_of_studybusiness.industryRanking020201 artificial intelligence & image processingData miningArtificial intelligencerobust clusteringbusinesscomputer

researchProduct

Newton Method for Minimal Learning Machine

2021

Minimal Learning Machine (MLM) is a distance-based supervised machine learning method for classification and regression problems. Its main advances are simple formulation and fast learning. Computing the MLM prediction in regression requires a solution to the optimization problem, which is determined by the input and output distance matrix mappings. In this paper, we propose to use the Newton method for solving this optimization problem in multi-output regression and compare the performance of this algorithm with the most popular Levenberg–Marquardt method. According to our knowledge, MLM has not been previously studied in the context of multi-output regression in the literature. In additio…

Optimization problemSpeedupbusiness.industryComputer scienceInitializationContext (language use)Regressionsymbols.namesakeDistance matrixsymbolsLocal search (optimization)Artificial intelligencebusinessNewton's method

researchProduct

Instance-Based Multi-Label Classification via Multi-Target Distance Regression

2021

Interest in multi-target regression and multi-label classification techniques and their applications have been increasing lately. Here, we use the distance-based supervised method, minimal learning machine (MLM), as a base model for multi-label classification. We also propose and test a hybridization of unsupervised and supervised techniques, where prototype-based clustering is used to reduce both the training time and the overall model complexity. In computational experiments, competitive or improved quality of the obtained models compared to the state-of-the-art techniques was observed. peerReviewed

Multi-label classificationmulti-target regressionComputer sciencebusiness.industryPattern recognitionminimal learning machinetekoälyRegressionmulti-label classification techniquesMulti targetComputingMethodologies_PATTERNRECOGNITIONkoneoppiminenArtificial intelligencebusiness

researchProduct

Comparison of Internal Clustering Validation Indices for Prototype-Based Clustering

2017

Clustering is an unsupervised machine learning and pattern recognition method. In general, in addition to revealing hidden groups of similar observations and clusters, their number needs to be determined. Internal clustering validation indices estimate this number without any external information. The purpose of this article is to evaluate, empirically, characteristics of a representative set of internal clustering validation indices with many datasets. The prototype-based clustering framework includes multiple, classical and robust, statistical estimates of cluster location so that the overall setting of the paper is novel. General observations on the quality of validation indices and on t…

Fuzzy clusteringlcsh:T55.4-60.8Computer scienceSingle-linkage clusteringCorrelation clustering02 engineering and technologycomputer.software_genrelcsh:QA75.5-76.95Theoretical Computer Scienceprototype-based clusteringCURE data clustering algorithm020204 information systemsprototype-based clustering; clustering validation index; robust statisticsConsensus clusteringalgoritmit0202 electrical engineering electronic engineering information engineeringlcsh:Industrial engineering. Management engineeringCluster analysisk-medians clusteringta113Numerical Analysisbusiness.industryPattern recognitionDetermining the number of clusters in a data setComputational MathematicsComputingMethodologies_PATTERNRECOGNITIONComputational Theory and Mathematicsrobust statistics020201 artificial intelligence & image processinglcsh:Electronic computers. Computer scienceArtificial intelligenceData miningtiedonlouhintabusinessclustering validation indexcomputerAlgorithms

researchProduct

Minimal Learning Machine: Theoretical Results and Clustering-Based Reference Point Selection

2019

The Minimal Learning Machine (MLM) is a nonlinear supervised approach based on learning a linear mapping between distance matrices computed in the input and output data spaces, where distances are calculated using a subset of points called reference points. Its simple formulation has attracted several recent works on extensions and applications. In this paper, we aim to address some open questions related to the MLM. First, we detail theoretical aspects that assure the interpolation and universal approximation capabilities of the MLM, which were previously only empirically verified. Second, we identify the task of selecting reference points as having major importance for the MLM's generaliz…

FOS: Computer and information sciencesComputer Science - Machine LearningMinimal Learning MachinekoneoppiminenStatistics - Machine Learninguniversal approximationMachine Learning (stat.ML)interpolointiapproksimointireference point selectionclusteringMachine Learning (cs.LG)

researchProduct

Orientation Adaptive Minimal Learning Machine for Directions of Atomic Forces

2021

Machine learning (ML) force fields are one of the most common applications of ML in nanoscience. However, commonly these methods are trained on potential energies of atomic systems and force vectors are omitted. Here we present a ML framework, which tackles the greatest difficulty on using forces in ML: accurate prediction of force direction. We use the idea of Minimal Learning Machine to device a method which can adapt to the orientation of an atomic environment to estimate the directions of force vectors. The method was tested with linear alkane molecules. peerReviewed

atomsComputer sciencebusiness.industryforce directionsmolekyylitOrientation (graph theory)nanotieteetatomitmachine learningkoneoppiminenMinimal learning machineComputer visionmoleculesArtificial intelligencebusiness

researchProduct

Improving Scalable K-Means++

2021

Two new initialization methods for K-means clustering are proposed. Both proposals are based on applying a divide-and-conquer approach for the K-means‖ type of an initialization strategy. The second proposal also uses multiple lower-dimensional subspaces produced by the random projection method for the initialization. The proposed methods are scalable and can be run in parallel, which make them suitable for initializing large-scale problems. In the experiments, comparison of the proposed methods to the K-means++ and K-means‖ methods is conducted using an extensive set of reference and synthetic large-scale datasets. Concerning the latter, a novel high-dimensional clustering data generation …

random projectionlcsh:T55.4-60.8K-means++algoritmitclustering initializationalgoritmiikkalcsh:Industrial engineering. Management engineeringklusterianalyysilcsh:Electronic computers. Computer sciencetiedonlouhintaK-means‖lcsh:QA75.5-76.95

researchProduct

Scalable robust clustering method for large and sparse data

2018

Datasets for unsupervised clustering can be large and sparse, with significant portion of missing values. We present here a scalable version of a robust clustering method with the available data strategy. Moreprecisely, a general algorithm is described and the accuracy and scalability of a distributed implementation of the algorithm is tested. The obtained results allow us to conclude the viability of the proposed approach. peerReviewed

datadatasetsklusterianalyysiclustering

researchProduct

Feature selection for distance-based regression: An umbrella review and a one-shot wrapper

2023

Feature selection (FS) may improve the performance, cost-efficiency, and understandability of supervised machine learning models. In this paper, FS for the recently introduced distance-based supervised machine learning model is considered for regression problems. The study is contextualized by first providing an umbrella review (review of reviews) of recent development in the research field. We then propose a saliency-based one-shot wrapper algorithm for FS, which is called MAS-FS. The algorithm is compared with a set of other popular FS algorithms, using a versatile set of simulated and benchmark datasets. Finally, experimental results underline the usefulness of FS for regression, confirm…

EMLMfeature selectionkoneoppiminenArtificial IntelligenceCognitive Neurosciencealgoritmitparantaminen (paremmaksi muuttaminen)tekoälydistance-based methodwrapper algorithmfeature saliencyComputer Science ApplicationsNeurocomputing

researchProduct

Au38Q MBTR-K3

2020

Purpose The purpose of Au38Q MBTR-K3 is to test the scalability of a machine learning regression model when the number of observations and the number of features change. Background The Au38Q MBTR-K3 was created from a trajectory file regarding the density functional theory simulation of Au38Q hybrid nanoparticle performed by Juarez-Mosqueda et al. in their paper Ab initio molecular dynamics studies of Au38(SR)24 isomers under heating using the MBTR descriptor by Himanen et al. as presented in paper DScribe: Library of descriptors for machine learning in materials science. The MBTR was used with the default parameters for K=3 (angles between atoms) presented at the website of Dscribe version…

Many Body Tensor RepresentationMBTRHybrid nanoparticlesRegression

researchProduct

Au38Q MBTR-K3

2020

Purpose The purpose of Au38Q MBTR-K3 is to test the scalability of a machine learning regression model when the number of observations and the number of features change. Background The Au38Q MBTR-K3 was created from a trajectory file regarding the density functional theory simulation of Au38Q hybrid nanoparticle performed by Juarez-Mosqueda et al. in their paper Ab initio molecular dynamics studies of Au38(SR)24 isomers under heating using the MBTR descriptor by Himanen et al. as presented in paper DScribe: Library of descriptors for machine learning in materials science. The MBTR was used with the default parameters for K=3 (angles between atoms) presented at the website of Dscribe vers…

Many Body Tensor RepresentationMBTRHybrid nanoparticlesRegression

researchProduct