Search results for "Machine learning"
showing 10 items of 1464 documents
Microstructure–property relation and machine learning prediction of hole expansion capacity of high-strength steels
2021
Abstract The relationship between microstructure features and mechanical properties plays an important role in the design of materials and improvement of properties. Hole expansion capacity plays a fundamental role in defining the formability of metal sheets. Due to the complexity of the experimental procedure of testing hole expansion capacity, where many influencing factors contribute to the resulting values, the relationship between microstructure features and hole expansion capacity and the complexity of this relation is not yet fully understood. In the present study, an experimental dataset containing the phase constituents of 55 microstructures as well as corresponding properties, su…
A computer program suitable for analysis of choice of categories in biomedical data recognition problems.
1980
The optimum choice of categories in problems of medical data recognition is governed by the choice of categories, the selection of appropriate features, and by the choice of a loss function. Under these circumstances it is often difficult to find out the suitable classification scheme. The computer program described here serves for the design of the optimum recognition procedure. The Bayes rule is used as decision rule. A criterion for the comparison of different choice of categories is given. The program can be performed after estimation of the underlying prior probabilities and the conditional densities obtained from a training set, and before testing the decision rule with real data.
An Optimized Design of Choice Experiments: A New Approach for Studying Decision Behavior in Choice Task Experiments
2014
In this paper, we present a new approach for the optimal experimental design problem of generating diagnostic choice tasks, where the respondent's decision strategy can be unambiguously deduced from the observed choice. In this new approach, we applied a genetic algorithm that creates a one-to-one correspondence between a set of predefined decision strategies and the alternatives of the choice task; it also manipulates the characteristics of the choice tasks. In addition, this new approach takes into account the measurement errors that can occur when the preferences of the decision makers are being measured. The proposed genetic algorithm is capable of generating diagnostic choice tasks eve…
Incremental linear model trees on massive datasets
2013
The existence of massive datasets raises the need for algorithms that make efficient use of resources like memory and computation time. Besides well-known approaches such as sampling, online algorithms are being recognized as good alternatives, as they often process datasets faster using much less memory. The important class of algorithms learning linear model trees online (incremental linear model trees or ILMTs in the following) offers interesting options for regression tasks in this sense. However, surprisingly little is known about their performance, as there exists no large-scale evaluation on massive stationary datasets under equal conditions. Therefore, this paper shows their applica…
Stability-Based Model Selection for High Throughput Genomic Data: An Algorithmic Paradigm
2012
Clustering is one of the most well known activities in scien- tific investigation and the object of research in many disciplines, ranging from Statistics to Computer Science. In this beautiful area, one of the most difficult challenges is the model selection problem, i.e., the identifi- cation of the correct number of clusters in a dataset. In the last decade, a few novel techniques for model selection, representing a sharp departure from previous ones in statistics, have been proposed and gained promi- nence for microarray data analysis. Among those, the stability-based methods are the most robust and best performing in terms of predic- tion, but the slowest in terms of time. Unfortunately…
One-Sided Prototype Selection on Class Imbalanced Dissimilarity Matrices
2012
In the dissimilarity representation paradigm, several prototype selection methods have been used to cope with the topic of how to select a small representation set for generating a low-dimensional dissimilarity space. In addition, these methods have also been used to reduce the size of the dissimilarity matrix. However, these approaches assume a relatively balanced class distribution, which is grossly violated in many real-life problems. Often, the ratios of prior probabilities between classes are extremely skewed. In this paper, we study the use of renowned prototype selection methods adapted to the case of learning from an imbalanced dissimilarity matrix. More specifically, we propose the…
On Duality in Learning and the Selection of Learning Teams
1996
AbstractPrevious work in inductive inference dealt mostly with finding one or several machines (IIMs) that successfully learn collections of functions. Herein we start with a class of functions and considerthe learner setof all IIMs that are successful at learning the given class. Applying this perspective to the case of team inference leads to the notion ofdiversificationfor a class of functions. This enable us to distinguish between several flavours of IIMs all of which must be represented in a team learning the given class.
Variability of Classification Results in Data with High Dimensionality and Small Sample Size
2021
The study focuses on the analysis of biological data containing information on the number of genome sequences of intestinal microbiome bacteria before and after antibiotic use. The data have high dimensionality (bacterial taxa) and a small number of records, which is typical of bioinformatics data. Classification models induced on data sets like this usually are not stable and the accuracy metrics have high variance. The aim of the study is to create a preprocessing workflow and a classification model that can perform the most accurate classification of the microbiome into groups before and after the use of antibiotics and lessen the variability of accuracy measures of the classifier. To ev…
A local complexity based combination method for decision forests trained with high-dimensional data
2012
Accurate machine learning with high-dimensional data is affected by phenomena known as the “curse” of dimensionality. One of the main strategies explored in the last decade to deal with this problem is the use of multi-classifier systems. Several of such approaches are inspired by the Random Subspace Method for the construction of decision forests. Furthermore, other studies rely on estimations of the individual classifiers' competence, to enhance the combination in the multi-classifier and improve the accuracy. We propose a competence estimate which is based on local complexity measurements, to perform a weighted average combination of the decision forest. Experimental results show how thi…
Data Analysis and Bioinformatics
2007
Data analysis methods and techniques are revisited in the case of biological data sets. Particular emphasis is given to clustering and mining issues. Clustering is still a subject of active research in several fields such as statistics, pattern recognition, and machine learning. Data mining adds to clustering the complications of very large data-sets with many attributes of different types. And this is a typical situation in biology. Some cases studies are also described.