6533b827fe1ef96bd1286dfb

RESEARCH PRODUCT

An analysis of the bias of variation operators of estimation of distribution programming

Franz RothlaufDirk Schweim

subject

education.field_of_studyPopulationSampling (statistics)0102 computer and information sciences02 engineering and technologyOverfittingRandom walk01 natural sciencesNoiseEstimation of distribution algorithm010201 computation theory & mathematicsStatistics0202 electrical engineering electronic engineering information engineeringBhattacharyya distance020201 artificial intelligence & image processingeducationRandom variableMathematics

description

Estimation of distribution programming (EDP) replaces standard GP variation operators with sampling from a learned probability model. To ensure a minimum amount of variation in a population, EDP adds random noise to the probabilities of random variables. This paper studies the bias of EDP's variation operator by performing random walks. The results indicate that the complexity of the EDP model is high since the model is overfitting the parent solutions when no additional noise is being used. Adding only a low amount of noise leads to a strong bias towards small trees. The bias gets stronger with an increased amount of noise. Our findings do not support the hypothesis that sampling drift is the reason for the loss of diversity. Furthermore, we suggest using property vectors to study the bias of variation operators. Property vectors can represent the distribution of a population's relevant property, such as tree depth or tree size. The Bhattacharyya coefficient of two property vectors is a measure of the similarity of the two distributions of population properties. The results for EDP and standard GP illustrate that search bias can be assessed by representing distributions using property vectors and measuring their similarity using the Bhattacharyya coefficient.

https://doi.org/10.1145/3205455.3205582