0000000000007573

AUTHOR

Dominik Sobania

0000-0001-8873-7143

showing 8 related works from this author

A generalizability measure for program synthesis with genetic programming

2021

The generalizability of programs synthesized by genetic programming (GP) to unseen test cases is one of the main challenges of GP-based program synthesis. Recent work showed that increasing the amount of training data improves the generalizability of the programs synthesized by GP. However, generating training data is usually an expensive task as the output value for every training case must be calculated manually by the user. Therefore, this work suggests an approximation of the expected generalization ability of solution candidates found by GP. To obtain candidate solutions that all solve the training cases, but are structurally different, a GP run is not stopped after the first solution …

business.industryGeneralizationComputer scienceValue (computer science)Genetic programmingMachine learningcomputer.software_genreTask (project management)Set (abstract data type)Test caseGeneralizability theoryArtificial intelligencebusinesscomputerProgram synthesisProceedings of the Genetic and Evolutionary Computation Conference
researchProduct

On the Generalizability of Programs Synthesized by Grammar-Guided Genetic Programming

2021

Grammar-guided Genetic Programming is a common approach for program synthesis where the user’s intent is given by a set of input/output examples. For use in real-world software development, the generated programs must work on previously unseen test cases too. Therefore, we study in this work the generalizability of programs synthesized by grammar-guided GP with lexicase selection. As benchmark, we analyze proportionate and tournament selection too. We find that especially for program synthesis problems with a low output cardinality (e.g., a Boolean output) lexicase selection overfits the training cases and does not generalize well to unseen test cases. An analysis using common software metr…

business.industryComputer scienceSoftware developmentGenetic programming02 engineering and technologyMachine learningcomputer.software_genreTournament selectionSoftware metricTest case020204 information systems0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingGeneralizability theoryArtificial intelligencebusinesscomputerSelection (genetic algorithm)Program synthesis
researchProduct

Improving estimation of distribution genetic programming with novelty initialization

2021

Estimation of distribution genetic programming (EDA-GP) replaces the standard variation operations of genetic programming (GP) by learning and sampling from a probabilistic model. Unfortunately, many EDA-GP approaches suffer from a rapidly decreasing population diversity which often leads to premature convergence. However, novelty search, an approach that searches for novel solutions to cover sparse areas of the search space, can be used for generating diverse initial populations. In this work, we propose novelty initialization and test this new method on a generalization of the royal tree problem and compare its performance to ramped half-and-half (RHH) using a recent EDA-GP approach. We f…

Computer sciencebusiness.industryGeneralizationNoveltyInitializationStatistical modelGenetic programmingVariation (game tree)Machine learningcomputer.software_genreTree (data structure)Artificial intelligencebusinesscomputerPremature convergenceProceedings of the Genetic and Evolutionary Computation Conference Companion
researchProduct

Using knowledge of human-generated code to bias the search in program synthesis with grammatical evolution

2021

Recent studies show that program synthesis with GE produces code that has different structure compared to human-generated code, e.g., loops and conditions are hardly used. In this article, we extract knowledge from human-generated code to guide evolutionary search. We use a large code-corpus that was mined from the open software repository service GitHub and measure software metrics and properties describing the code-base. We use this knowledge to guide the search by incorporating a new selection scheme. Our new selection scheme favors programs that are structurally similar to the programs in the GitHub code-base. We find noticeable evidence that software metrics can help in guiding evoluti…

Scheme (programming language)Structure (mathematical logic)Service (systems architecture)Information retrievalComputer scienceGrammatical evolutionCode (cryptography)Genetic programmingcomputerSoftware metricProgram synthesiscomputer.programming_languageProceedings of the Genetic and Evolutionary Computation Conference Companion
researchProduct

CovSel

2018

Ensemble methods combine the predictions of a set of models to reach a better prediction quality compared to a single model's prediction. The ensemble process consists of three steps: 1) the generation phase where the models are created, 2) the selection phase where a set of possible ensembles is composed and one is selected by a selection method, 3) the fusion phase where the individual models' predictions of the selected ensemble are combined to an ensemble's estimate. This paper proposes CovSel, a selection approach for regression problems that ranks ensembles based on the coverage of adequately estimated training points and selects the ensemble with the highest coverage to be used in th…

Computer scienceProcess (computing)Phase (waves)Genetic programming02 engineering and technology01 natural sciencesEnsemble learningSet (abstract data type)010104 statistics & probability0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingPoint (geometry)0101 mathematicsSymbolic regressionAlgorithmSelection (genetic algorithm)Proceedings of the Genetic and Evolutionary Computation Conference
researchProduct

Teaching GP to program like a human software developer

2019

Program synthesis is one of the relevant applications of GP with a strong impact on new fields such as genetic improvement. In order for synthesized code to be used in real-world software, the structure of the programs created by GP must be maintainable. We can teach GP how real-world software is built by learning the relevant properties of mined human-coded software - which can be easily accessed through repository hosting services such as GitHub. So combining program synthesis and repository mining is a logical step. In this paper, we analyze if GP can write programs with properties similar to code produced by human software developers. First, we compare the structure of functions generat…

Perplexitybusiness.industryProgramming languageComputer scienceInitializationGenetic programming0102 computer and information sciences02 engineering and technologycomputer.software_genre01 natural sciencesSoftware010201 computation theory & mathematicsGrammatical evolution0202 electrical engineering electronic engineering information engineeringCode (cryptography)020201 artificial intelligence & image processingLanguage modelbusinesscomputerProgram synthesisProceedings of the Genetic and Evolutionary Computation Conference
researchProduct

Challenges of Program Synthesis with Grammatical Evolution

2020

Program synthesis is an emerging research topic in the field of EC with the potential to improve real-world software development. Grammar-guided approaches like GE are suitable for program synthesis as they can express common programming languages with their required properties. This work uses common software metrics (lines of code, McCabe metric, size and depth of the abstract syntax tree) for an analysis of GE’s search behavior and the resulting problem structure. We find that GE is not able to solve program synthesis problems, where correct solutions have higher values of the McCabe metric (which means they require conditions or loops). Since small mutations of high-quality solutions str…

Theoretical computer scienceSource lines of codebusiness.industryComputer scienceSoftware developmentGenetic programming0102 computer and information sciences02 engineering and technology01 natural sciencesSoftware metric010201 computation theory & mathematicsGrammatical evolutionMetric (mathematics)0202 electrical engineering electronic engineering information engineering020201 artificial intelligence & image processingbusinessAbstract syntax treeProgram synthesis
researchProduct

Down-Sampled Epsilon-Lexicase Selection for Real-World Symbolic Regression Problems

2023

Epsilon-lexicase selection is a parent selection method in genetic programming that has been successfully applied to symbolic regression problems. Recently, the combination of random subsampling with lexicase selection significantly improved performance in other genetic programming domains such as program synthesis. However, the influence of subsampling on the solution quality of real-world symbolic regression problems has not yet been studied. In this paper, we propose down-sampled epsilon-lexicase selection which combines epsilon-lexicase selection with random subsampling to improve the performance in the domain of symbolic regression. Therefore, we compare down-sampled epsilon-lexicase w…

FOS: Computer and information sciencesComputer Science - Neural and Evolutionary ComputingNeural and Evolutionary Computing (cs.NE)
researchProduct