On the Influence of Grammars on Crossover in Grammatical Evolution
Standard grammatical evolution (GE) uses a one-point crossover (“ripple crossover”) that exchanges codons between two genotypes. The two resulting genotypes are then mapped to their respective phenotypes using a Backus-Naur form grammar. This article studies how different types of grammars affect the resulting individuals of a ripple crossover. We distinguish different grammars based on the expected number of non-terminals chosen when mapping genotype codons to phenotypes, \(B_{avg}\). The grammars only differ in \(B_{avg}\) but can express the same phenotypes. We perform crossover operations on the genotypes and find that grammars with \(B_{avg} > 1\) lead to high numbers of either very sm…
An analysis of the bias of variation operators of estimation of distribution programming
Estimation of distribution programming (EDP) replaces standard GP variation operators with sampling from a learned probability model. To ensure a minimum amount of variation in a population, EDP adds random noise to the probabilities of random variables. This paper studies the bias of EDP's variation operator by performing random walks. The results indicate that the complexity of the EDP model is high since the model is overfitting the parent solutions when no additional noise is being used. Adding only a low amount of noise leads to a strong bias towards small trees. The bias gets stronger with an increased amount of noise. Our findings do not support the hypothesis that sampling drift is …
Using knowledge of human-generated code to bias the search in program synthesis with grammatical evolution
Recent studies show that program synthesis with GE produces code that has different structure compared to human-generated code, e.g., loops and conditions are hardly used. In this article, we extract knowledge from human-generated code to guide evolutionary search. We use a large code-corpus that was mined from the open software repository service GitHub and measure software metrics and properties describing the code-base. We use this knowledge to guide the search by incorporating a new selection scheme. Our new selection scheme favors programs that are structurally similar to the programs in the GitHub code-base. We find noticeable evidence that software metrics can help in guiding evoluti…
DAE-GP
Estimation of distribution genetic programming (EDA-GP) algorithms are metaheuristics where sampling new solutions from a learned probabilistic model replaces the standard mutation and recombination operators of genetic programming (GP). This paper presents DAE-GP, a new EDA-GP which uses denoising autoencoder long short-term memory networks (DAE-LSTMs) as probabilistic model. DAE-LSTMs are artificial neural networks that first learn the properties of a parent population by mapping promising candidate solutions to a latent space and reconstructing the candidate solutions from the latent space. The trained model is then used to sample new offspring solutions. We show on a generalization of t…
On sampling error in evolutionary algorithms
The initial population in evolutionary algorithms (EAs) should form a representative sample of all possible solutions (the search space). While large populations accurately approximate the distribution of possible solutions, small populations tend to incorporate a sampling error. A low sampling error at initialization is necessary (but not sufficient) for a reliable search since a low sampling error reduces the overall random variations in a random sample. For this reason, we have recently presented a model to determine a minimum initial population size so that the sampling error is lower than a threshold, given a confidence level. Our model allows practitioners of, for example, genetic pro…
On the Non-uniform Redundancy of Representations for Grammatical Evolution: The Influence of Grammars
The representation used in grammatical evolution (GE) is non-uniformly redundant as some phenotypes are represented by more genotypes than others. This article studies how the non-uniform redundancy of the GE representation depends on various types of grammars. When constructing the phenotype tree from a genotype, the used grammar determines Bavg, the average branching factor. Bavg measures the expected number of non-terminals chosen when mapping one genotype codon to a phenotype tree node. First, the paper illustrates that the GE representation induces a bias towards small trees. This bias gets stronger with lower Bavg. For example, when using a grammar with Bavg = 0.5, 75% of all genotype…