alogos._optimization.ea

Subpackages

Submodules

Classes

EvolutionaryAlgorithm

Evolutionary algorithm for searching an optimal string in a grammar's language.


Detailed object descriptions

class alogos._optimization.ea.EvolutionaryAlgorithm(grammar, objective_function, objective, system='cfggpst', evaluator=None, **kwargs)[source]

Evolutionary algorithm for searching an optimal string in a grammar’s language.

The role of this class is explained by first introducing basic concepts from optimization theory and the field of grammar-guided genetic programming. Then the usage of this evolutionary algorithm is explained.

  1. Concepts from optimization theory

    • Optimization: In general terms, optimization means to search for the best item out of many alternatives, or in other words, an optimal solution out of many candidate solutions.

    • Search space: The set of all candidate solutions is called the search space. It can have a finite or infinite number of elements and there can be mathematical relationships between them, which can be used to efficiently sample the space.

    • Search goal: Having a best or optimal item in the space requires that there is an order on the items. Such an order can be established in different ways, but the most common one is to define an objective function, which takes an item as input and returns a number as output. This number represents a quality score or fitness of the item and puts a weak order on all items, which means that some of them can be equally good and therefore multiple optimal items may exist that share the best quality score.

    • Search method: There are many optimization algorithms that work very efficiently on specific search spaces with certain mathematical structure. Methods that are more general and can act on a large variety of search spaces are sometimes called metaheuristics. They are usually stochastic optimization algorithms, which means they have a random element to them and that implies that running them several times can produce different results, i.e. find candidate solutions of different quality. They don’t come with any guarantee to find a good or optimal solution, but often can successfully do so in hard optimization problems and are therefore often used in practice when the search space can not be tackled with exact methods. An evolutionary algorithm is a prominent example of a population-based metaheuristic, which was originally inspired by the process of natural evolution. Genetic programming is one subdiscipline in the field of evolutionary computation that deals with optimization in spaces of programs. Grammar-guided genetic programming (G3P) is one flavor of it that allows to define the search space with a context-free grammar, which is very often used in computer science to define various kinds of formal languages, including modern programming languages like Python or Rust.

  2. Concepts from grammar-guided genetic programming

    • A grammar can be used to define a formal language.

    • A formal language is a finite or infinite set of string.

    • A search algorithm can try to find the best item in a set according to a given search goal.

    • An objective function can define a search goal by assigning a score to each item in a set, so that the item with minimum or maximum score are considered the best ones and need to be found.

This class provides a metaheuristic search method known as evolutionary algorithm, which was tailored for use with different grammar-guided genetic programming systems such as Grammatical Evolution (GE) and Context-Free Grammar Genetic Programming (CFG-GP). It uses following concepts that are directly relevant for the usage of this class:

  • Individual: An individual represents a single candidate solution and comes with a genotype, phenotype and fitness. The genotype can be modified with variation operators to give rise to new closely related genotypes and thereby move through the search space. It also allows to derive a phenotype, which is an element of the search space that in the case of grammar-guided genetic programming is simply a string of the grammar’s language. The fitness is a number that indicates the quality of a candidate solution and is calculated by a user-provided objective function.

  • Population: A population is a collection of individuals.

  • Generation: A generation is one of several consecutive populations used throughout a search with an evolutionary algorithm. The search begins with the creation of a random initial population and proceeds by generating one population after the other, each based on the previous one. The idea is that the average fitness of the individuals in the generations can be increased by using variation operators to generate new, closely related individuals together with selection operators that introduce a preference for individuals with higher fitness. This is done by

    1. Parent selection to choose individuals that are allowed to create offspring.

    2. Crossover to create new individuals by mixing the genotypes of parent individuals.

    3. Mutation to introduce slight random variations into the offspring individuals.

    4. Survivor selection to choose individuals that are allowed to continue into the next generation.

    All of these steps can be performed by different operators and this class provides several well-known ones to modify how the search is performed.

__init__(self, grammar, objective_function, objective, system='cfggpst', evaluator=None, **kwargs)[source]

Create and parameterize an evolutionary algorithm.

Parameters:
  • grammar (Grammar) – Context-free grammar that defines the search space. A grammar specifies a formal language, which is a finite or infinite set of strings. Each item of the search space is simply a string of the grammar’s language. The aim is to find an optimal string according to a user-provided objective.

  • objective_function (Callable) – A function that gets an item of the search space as input and needs to return a numerical value for it as output. In other words, it gets a phenotype and returns a fitness value for it that represents how good it is.

    Given that the search space is a grammar’s language, each candidate solution or phenotype is a string. Therefore a very simple objective function could just measure the length of each string and return that as fitness value:

    def obj_fun(string):
        fitness = len(string)
        return fitness
    

    More realistic objective functions will usually evaluate the string in some way and measure how well it performs on a task.

  • objective (str) – Goal of the optimization, which is to find a candidate solution that has either the minimum or maximum value assigned to it by the objective function.

    Possible values:

    • "min": Minimization, i.e. looking for a candidate solution with the smallest possible fitness value.

    • "max": Maximization, i.e. looking for a candidate solution with the highest possible fitness value.

  • system (str, optional) – Grammar-guided gentic programming system used by the evolutionary algorithm.

    Possible values:

    • "cfggp": CFG-GP, see cfggp.

    • "cfggpst": CFG-GP-ST, see cfggpst.

    • "dsge": DSGE, see dsge.

    • "ge": GE, see ge.

    • "pige": piGE, see pige.

    • "whge": WHGE, see whge.

    The system provides following components to the search algorithm:

    • Genotype representation: This is a system-specific, indirect representation of a candidate solution, e.g. a list of int, a bitarray, or a tree.

    • Genotype-phenotype mapping: This is used to translate a genotype to a phenotype. Note that a genotype is system-specific, while a phenotype is a string of the grammar’s language.

    • Random variation operators: These are used to move through the search space by generating new genotypes that are close or related to the original ones. This is achieved by randomly introducing slight changes into a genotype (mutation) or recombining multiple genotypes into new ones (crossover).

  • evaluator (Callable, optional) – A function that gets a function together with a list of items as input and needs to return a list of new items as output. The task of this function is to evaluate the given function for each item in the list, and collect the return values in a new list. All function calls are assumed to be independent and may be computed serially, in parallel or in a distributed fashion.

  • **kwargs (dict, optional) – Further keyword arguments are forwarded either to the evolutionary algorithm or to the chosen system. This allows to control the search process in great detail. Every parameter can also be changed later on via the attribute parameters of the generated instance. This can be done before starting a run with the run method, but also throughout a run when it is performed one step at a time by repeatedly calling the step method.

    For a description of available keyword arguments, see

    For example, when the chosen system is GE, the creation and parametrization of the evolutionary algorithm could look like this:

    import alogos as al
    
    grammar = al.Grammar("<S> ::= 1 | 22 | 333")
    
    def obj_fun(string):
        return len(string)
    
    ea = al.EvolutionaryAlgorithm(
        grammar, obj_fun, "max", system="ge",
        max_generations=10, max_wraps=2)
    ea.run()
    

    Note that in this example max_generations is a parameter of the evolutionary algorithm itself and therefore it can be used no matter which system is chosen. In contrast, max_wraps is a parameter specific to the system GE and would not be accepted if for example the system CFG-GP were chosen by passing system="cfggp" instead of system="ge".

Raises:

ParameterError – If a parameter is provided that is not known to the evolutionary algorithm or to the chosen system. A list of available parameters and their default values will be printed.

step(self)[source]

Perform a single step of the evolutionary search.

Taking a single step means that one generation is added, which can happen in one of two ways:

  • If the run has just started, the first generation is created by initializating a population.

  • If the run is already ongoing, a new generation is derived from the current generation by a process that involves 1) parent selection that prefers fitter individuals, 2) generating an offspring population by applying random variation operators (mutation, crossover) to the parents and 3) survivor selection that again prefers fitter individuals.

Two important notes about the behavior of this method:

  • It contrast to the run method, it does not check if any stop criterion has been met! For example, when the number of generations reaches the value provided by the max_generations parameter, calling this method will regardless add another generation.

  • Since this method has to be called repeatedly to perform the individual steps of a run, it enables to change the parameters of the evolutionary algorithm on the fly by modifying the parameters attribute between calls. This means the search can be adjusted in much greater detail than with the run method. For example, the mutation and crossover rates can be gradually modified during a run to reduce the amount of variation when getting closer to the goal. It is also possible to change the population size or even the objective function. Fixed throughout a run, however, are the grammar (=search space) and the G3P system (=internal representation).

Returns:

best_individual (Individual of the chosen G3P system) – The individual with the best fitness found so far in the entire search. This means the individual may have been discovered not in the current step but in an earlier one.

run(self)[source]

Perform one search step after another until a stop criterion is met.

A user can set a single or multiple stop criteria. The algorithm will halt when one of them is fulfilled. Caution: If the criteria are set in such a way that they can never be realized, e.g. because an unreachable fitness threshold was chosen, then the run will not halt until it is interrupted from the outside.

Note that after a search has halted, it is possible to change the stop criteria and call run again to continue the search. This can also be done multiple times.

Returns:

best_individual (Individual of the chosen G3P system) – The individual with the best fitness found so far in the entire search.

Raises:

ParameterError – If no stop criterion was set. This prevents starting a run that clearly would not halt.

Warns:

OptimizationWarning – If not a single step was taken, because some stop criterion was already fulfilled by the current search state.

is_stop_criterion_met(self)[source]

Check if the current search state meets any stop criterion.

This method is used internally by the run method. It can also be useful when constructing a specialized search that repeatedly calls the step method, where it is the responsibility of the user to ensure a stop.

Returns:

stop (bool) – True if a stop criterion is met, False otherwise.

reset(self)[source]

Reset the current search state to enable a fresh start.

Not all aspects of the algorithm object are simply set back to their initial state:

  • Parameters are not reset to their default values. If any non-default parameters were passed during object-creation or if they were set to user-provided values later via the parameters attribute, then these values are kept and not overwritten by default values. The purpose of this behavior is to enable repeated runs that are directly comparable.

  • If a database was used, it is reset, but what that means depends on its location:

    • If the database location is in memory, the content is entirely lost during a reset.

    • If the database location is a file, it is renamed during a reset and a new file is created. This means the content of a run is not lost but relocated.

  • Cached values are fully deleted. This means previous phenotype-to-fitness evaluations will not be reused in a follow-up run.