alogos
¶
A library of grammatical optimization systems.
This package implements several methods from the field of Grammar-Guided Genetic Programming (G3P).
Subpackages¶
Submodules¶
Classes¶
Context-free grammar (CFG) for defining a formal language. |
|
Evolutionary algorithm for searching an optimal string in a grammar's language. |
Detailed object descriptions¶
- class alogos.Grammar(bnf_text=None, bnf_file=None, ebnf_text=None, ebnf_file=None, **kwargs)[source]¶
Context-free grammar (CFG) for defining a formal language.
The role of this class is explained by first introducing basic concepts from formal language theory and then describing the main tasks a grammar can be used for.
Technical terms from formal language theory
A grammar is a mathematical device to define a formal language.
A language is a finite or infinite set of strings.
A string is a finite sequence of symbols that all belong to a single alphabet.
An alphabet is a finite set of symbols.
In mathematical terms, a grammar is usually defined as a tuple
(N, T, P, S)
whereN
is a set of nonterminal symbols, also known as variables.T
is a set of terminal symbols, which has no overlap withN
.S
is the start symbol, which is a nontermminal symbol and therefore an element ofN
.P
is a set of production rules, also known as rewrite rules. Beginning from the start symbol, these rules can be applied until only terminal symbols are left, which then form a string of the grammar’s language. Different kinds of grammars can be distinguished by putting different restrictions on the form of these rules. This influences how expressive the grammars are and hence what kind of languages can be defined by them.
A context-free grammar (CFG) is a type of formal grammar that is frequently used in computer science and programming language design. The production rules need to fulfill following criteria:
The left-hand side of each rule is a single nonterminal symbol. This means a nonterminal symbol can be rewritten into the right-hand side of the rule, no matter which context the nonterminal is surrounded with.
The right-hand side is a sequence of symbols that can consist of a combination of terminal symbols, nonterminal symbols and the empty string
""
(often denoted by the greek letterɛ
or less oftenλ
).
Tasks a grammar can be used for
Define a grammar with a text in a suitable format such as Backus-Naur form (BNF) or Extended Backus-Naur form (EBNF). Both of these formats can be recognized by this class’s initialization method
__init__
.Visualize a grammar in form of a syntax diagram with the method
plot
.Recognize whether a given string belongs to the grammar’s language or not with the method
recognize_string
.Parse a given string to see how it can be derived from the start symbol by a specific sequence of rule applications with the method
parse_string
. The result is a parse tree or derivation tree, an instance of the classDerivationTree
. A derivation tree represents a single string by reading its leaf nodes from left to right, but many possible derivations that lead to it, since the sequence of rule applications is not fixed in the tree.Generate a random derivation, derivation tree or string with the methods
generate_derivation
,generate_derivation_tree
,generate_string
.Generate the grammar’s language or a finite subset of it with the method
generate_language
.Convert a grammar into a specific normal form or check if it is in one with the methods
to_cnf
,is_cnf
,to_gnf
,is_gnf
,to_bnf
,is_bnf
. Normal forms put further restrictions on the form of the production rules, but importantly, converting a grammar into it does not change its language.
Notes
The concepts and notation mentioned here are based on classical textbooks in formal language theory such as [1]. Similar discussions can be found on the web [2].
References
- __init__(self, bnf_text=None, bnf_file=None, ebnf_text=None, ebnf_file=None, **kwargs)[source]¶
Create a grammar from a string or file in BNF or EBNF notation.
Backus-Naur form (BNF) and Extended Backus-Naur form (EBNF) [3] are two well-known and often used formats for defining context-free grammars.
- Parameters:
bnf_text (
str
, optional) – String that contains a grammar in BNF.bnf_file (
str
, optional) – Filepath of a text file that contains a grammar in BNF.ebnf_text (
str
, optional) – String that contains a grammar in EBNF.ebnf_file (
str
, optional) – Filepath of a text file that contains a grammar in EBNF.**kwargs (
dict
, optional) – Further keyword arguments are forwarded to the function that reads and creates a grammar from text in BNF or EBNF notation. The available arguments and further details can be found in following methods:BNF:
from_bnf_text
,from_bnf_file
EBNF:
from_ebnf_text
,from_ebnf_file
- Raises:
TypeError – If an argument gets a value of unexpected type.
FileNotFoundError – If a filepath does not point to an existing file.
GrammarError – If a newly generated grammar does not pass all validity checks. For example, each nonterminal that appears on the right-hand side must also appear on the left-hand side.
- Warns:
GrammarWarning – If a newly generated grammar contains a production rule more than once.
References
Examples
Use text in Backus-Naur form (BNF) to define a grammar:
>>> import alogos as al >>> bnf_text = """ <S> ::= <Greeting> _ <Object> ! <Greeting> ::= Hello | Hi | Hola <Object> ::= World | Universe | Cosmos """ >>> grammar = al.Grammar(bnf_text=bnf_text) >>> grammar.generate_string() Hello_World!
Use text in or Extended Backus-Naur form (EBNF) to define a grammar:
>>> import alogos as al >>> ebnf_text = """ S = Greeting " " Object "!" Greeting = "Hello" | "Hi" | "Hola" Object = "World" | "Universe" | "Cosmos" """ >>> grammar = al.Grammar(ebnf_text=ebnf_text) >>> grammar.generate_string() Hola Universe!
Use text in an unusual form to define a grammar by passing custom symbol definitions as keyword arguments:
>>> import alogos as al >>> bnf_text = """ [English Sentence] = [Simple Sentence] [Simple Sentence] = [Declarative Sentence] [Declarative Sentence] = [subject] [predicate] [subject] = [simple subject] [simple subject] = [nominative personal pronoun] [nominative personal pronoun] = "I" | "you" | "he" | "she" | "it" | "we" | "they" [predicate] = [verb] [verb] = [linking verb] [linking verb] = "am" |"are" |"is" | "was"| "were" """ >>> grammar = al.Grammar( bnf_text=bnf_text, defining_symbol="=", start_nonterminal_symbol="[", end_nonterminal_symbol="]", start_terminal_symbol='"', end_terminal_symbol='"', ) >>> grammar.generate_string(separator=' ') I am
- copy(self)[source]¶
Create a deep copy of the grammar.
The new object is entirely independent of the original object. No parts, such as symbols or rules, are shared between the objects.
- from_bnf_text(self, bnf_text, defining_symbol='::=', rule_separator_symbol='|', start_nonterminal_symbol='<', end_nonterminal_symbol='>', start_terminal_symbol='', end_terminal_symbol='', start_terminal_symbol2='', end_terminal_symbol2='', verbose=False)[source]¶
Read a grammar specification in BNF notation from a string.
This method resets the grammar object and then uses the provided information.
- Parameters:
bnf_text (
str
) – String with grammar specification in BNF notation.- Other Parameters:
defining_symbol (
str
, optional) – Symbol between left-hand side and right-hand side of a rule.rule_separator_symbol (
str
, optional) – Symbol between alternative productions of a rule.start_nonterminal_symbol (
str
, optional) – Symbol indicating the start of a nonterminal.end_nonterminal_symbol (
str
, optional) – Symbol indicating the end of a nonterminal.start_terminal_symbol (
str
, optional) – Symbol indicating the start of a terminal.end_terminal_symbol (
str
, optional) – Symbol indicating the end of a terminal.start_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the start of a terminal.end_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the end of a terminal.verbose (
bool
, optional) – IfTrue
, messages will be printed during processing the input text that show which rules and symbols are found one after another. This can be useful to see what went wrong when the generated grammar does not look or behave as expected.
- from_bnf_file(self, filepath, defining_symbol='::=', rule_separator_symbol='|', start_nonterminal_symbol='<', end_nonterminal_symbol='>', start_terminal_symbol='', end_terminal_symbol='', start_terminal_symbol2='', end_terminal_symbol2='', verbose=False)[source]¶
Read a grammar specification in BNF notation from a file.
This method resets the grammar object and then uses the provided information.
- Parameters:
filepath (
str
) – Text file with grammar specification in BNF notation.- Other Parameters:
defining_symbol (
str
, optional) – Symbol between left-hand side and right-hand side of a rule.rule_separator_symbol (
str
, optional) – Symbol between alternative productions of a rule.start_nonterminal_symbol (
str
, optional) – Symbol indicating the start of a nonterminal.end_nonterminal_symbol (
str
, optional) – Symbol indicating the end of a nonterminal.start_terminal_symbol (
str
, optional) – Symbol indicating the start of a terminal.end_terminal_symbol (
str
, optional) – Symbol indicating the end of a terminal.start_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the start of a terminal.end_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the end of a terminal.verbose (
bool
, optional) – IfTrue
, messages will be printed during processing the input text that show which rules and symbols are found one after another. This can be useful to see what went wrong when the generated grammar does not look or behave as expected.
- from_ebnf_text(self, ebnf_text, defining_symbol='=', rule_separator_symbol='|', start_nonterminal_symbol='', end_nonterminal_symbol='', start_terminal_symbol='"', end_terminal_symbol='"', start_terminal_symbol2="'", end_terminal_symbol2="'", verbose=False)[source]¶
Read a grammar specification in EBNF notation from a string.
This method resets the grammar object and then uses the provided information.
- Parameters:
ebnf_text (
str
) – String with grammar specification in EBNF notation.- Other Parameters:
defining_symbol (
str
, optional) – Symbol between left-hand side and right-hand side of a rule.rule_separator_symbol (
str
, optional) – Symbol between alternative productions of a rule.start_nonterminal_symbol (
str
, optional) – Symbol indicating the start of a nonterminal.end_nonterminal_symbol (
str
, optional) – Symbol indicating the end of a nonterminal.start_terminal_symbol (
str
, optional) – Symbol indicating the start of a terminal.end_terminal_symbol (
str
, optional) – Symbol indicating the end of a terminal.start_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the start of a terminal.end_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the end of a terminal.verbose (
bool
, optional) – IfTrue
, messages will be printed during processing the input text that show which rules and symbols are found one after another. This can be useful to see what went wrong when the generated grammar does not look or behave as expected.
- from_ebnf_file(self, filepath, defining_symbol='=', rule_separator_symbol='|', start_nonterminal_symbol='', end_nonterminal_symbol='', start_terminal_symbol='"', end_terminal_symbol='"', start_terminal_symbol2="'", end_terminal_symbol2="'", verbose=False)[source]¶
Read a grammar specification in EBNF notation from a file.
This method resets the grammar object and then uses the provided information.
- Parameters:
filepath (
str
) – Text file with grammar specification in EBNF notation.- Other Parameters:
defining_symbol (
str
, optional) – Symbol between left-hand side and right-hand side of a rule.rule_separator_symbol (
str
, optional) – Symbol between alternative productions of a rule.start_nonterminal_symbol (
str
, optional) – Symbol indicating the start of a nonterminal.end_nonterminal_symbol (
str
, optional) – Symbol indicating the end of a nonterminal.start_terminal_symbol (
str
, optional) – Symbol indicating the start of a terminal.end_terminal_symbol (
str
, optional) – Symbol indicating the end of a terminal.start_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the start of a terminal.end_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the end of a terminal.verbose (
bool
, optional) – IfTrue
, messages will be printed during processing the input text that show which rules and symbols are found one after another. This can be useful to see what went wrong when the generated grammar does not look or behave as expected.
- to_bnf_text(self, rules_on_separate_lines=True, defining_symbol='::=', rule_separator_symbol='|', start_nonterminal_symbol='<', end_nonterminal_symbol='>', start_terminal_symbol='', end_terminal_symbol='', start_terminal_symbol2='', end_terminal_symbol2='')[source]¶
Write the grammar in BNF notation to a string.
- Parameters:
rule_on_separate_lines (
bool
, optional) – IfTrue
, each rule for a nonterminal is put onto a separate line. IfFalse
, all rules for a nonterminal are grouped onto one line.defining_symbol (
str
, optional) – Symbol between left-hand side and right-hand side of a rule.rule_separator_symbol (
str
, optional) – Symbol between alternative productions of a rule.start_nonterminal_symbol (
str
, optional) – Symbol indicating the start of a nonterminal.end_nonterminal_symbol (
str
, optional) – Symbol indicating the end of a nonterminal.start_terminal_symbol (
str
, optional) – Symbol indicating the start of a terminal.end_terminal_symbol (
str
, optional) – Symbol indicating the end of a terminal.start_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the start of a terminal.end_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the end of a terminal.
- to_bnf_file(self, filepath, rules_on_separate_lines=True, defining_symbol='::=', rule_separator_symbol='|', start_nonterminal_symbol='<', end_nonterminal_symbol='>', start_terminal_symbol='', end_terminal_symbol='', start_terminal_symbol2='', end_terminal_symbol2='')[source]¶
Write the grammar in BNF notation to a text file.
- Parameters:
filepath (
str
) – Filepath of the text file that shall be generated.- Other Parameters:
rule_on_separate_lines (
bool
, optional) – IfTrue
, each rule for a nonterminal is put onto a separate line. IfFalse
, all rules for a nonterminal are grouped onto one line.defining_symbol (
str
, optional) – Symbol between left-hand side and right-hand side of a rule.rule_separator_symbol (
str
) – Symbol between alternative productions of a rule.start_nonterminal_symbol (
str
) – Symbol indicating the start of a nonterminal.end_nonterminal_symbol (
str
) – Symbol indicating the end of a nonterminal.start_terminal_symbol (
str
) – Symbol indicating the start of a terminal.end_terminal_symbol (
str
) – Symbol indicating the end of a terminal.start_terminal_symbol2 (
str
) – Alternative symbol indicating the start of a terminal.end_terminal_symbol2 (
str
) – Alternative symbol indicating the end of a terminal.
- to_ebnf_text(self, rules_on_separate_lines=True, defining_symbol='=', rule_separator_symbol='|', start_nonterminal_symbol='', end_nonterminal_symbol='', start_terminal_symbol='"', end_terminal_symbol='"', start_terminal_symbol2='"', end_terminal_symbol2='"')[source]¶
Write the grammar in EBNF notation to a string.
- Parameters:
rule_on_separate_lines (
bool
) – IfTrue
, each rule for a nonterminal is put onto a separate line. IfFalse
, all rules for a nonterminal are grouped onto one line.defining_symbol (
str
) – Symbol between left-hand side and right-hand side of a rule.rule_separator_symbol (
str
) – Symbol between alternative productions of a rule.start_nonterminal_symbol (
str
) – Symbol indicating the start of a nonterminal.end_nonterminal_symbol (
str
) – Symbol indicating the end of a nonterminal.start_terminal_symbol (
str
) – Symbol indicating the start of a terminal.end_terminal_symbol (
str
) – Symbol indicating the end of a terminal.start_terminal_symbol2 (
str
) – Alternative symbol indicating the start of a terminal.end_terminal_symbol2 (
str
) – Alternative symbol indicating the end of a terminal.
- to_ebnf_file(self, filepath, rules_on_separate_lines=True, defining_symbol='=', rule_separator_symbol='|', start_nonterminal_symbol='', end_nonterminal_symbol='', start_terminal_symbol='"', end_terminal_symbol='"', start_terminal_symbol2='"', end_terminal_symbol2='"')[source]¶
Write the grammar in EBNF notation to a text file.
- Parameters:
filepath (
str
) – Filepath of the text file that shall be generated.- Other Parameters:
rule_on_separate_lines (
bool
, optional) – IfTrue
, each rule for a nonterminal is put onto a separate line. IfFalse
, all rules for a nonterminal are grouped onto one line.defining_symbol (
str
, optional) – Symbol between left-hand side and right-hand side of a rule.rule_separator_symbol (
str
, optional) – Symbol between alternative productions of a rule.start_nonterminal_symbol (
str
, optional) – Symbol indicating the start of a nonterminal.end_nonterminal_symbol (
str
, optional) – Symbol indicating the end of a nonterminal.start_terminal_symbol (
str
, optional) – Symbol indicating the start of a terminal.end_terminal_symbol (
str
, optional) – Symbol indicating the end of a terminal.start_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the start of a terminal.end_terminal_symbol2 (
str
, optional) – Alternative symbol indicating the end of a terminal.
- generate_derivation_tree(self, method='weighted', *args, **kwargs)[source]¶
Generate a derivation tree with a chosen method.
One derivation tree represents one string but many derivations [4].
The leaf nodes of the tree read from left to right form a string of the language defined by the grammar.
A depth-first walk over the tree that always chooses the leftmost item gives the leftmost derivation, while one that always chooses the rightmost item gives the rightmost derivation. In most cases, there can be many more ways to expand one nonterminal after another, leading to a plethora of possible derivations, all represented by the same derivation tree.
- Parameters:
method (
str
, optional) – Name of the method that is used for creating a derivation tree.Possible values:
Deterministic methods that require the keyword argument
genotype
as input data:Probabilistic methods that generate a random tree and require no input data:
"uniform"
: Usesuniform
."weighted"
: Usesweighted
."ptc2"
: Usesptc2
."grow_one_branch_to_max_depth"
: Usesgrow_one_branch_to_max_depth
."grow_all_branches_within_max_depth"
: Usesgrow_all_branches_within_max_depth
"grow_all_branches_to_max_depth"
: Usesgrow_all_branches_to_max_depth
.
**kwargs (
dict
, optional) – Further keyword arguments are forwarded to the chosen method. This is especially relevant for controlling the behavior of the probabilistic methods.
- Returns:
derivation_tree (
DerivationTree
)- Raises:
MappingError – If the method fails to generate a derivation tree, e.g. because a limit like maximum number of expansions was reached before all leaves contained terminal symbols
References
- generate_derivation(self, method='weighted', *args, derivation_order='leftmost', newline=True, **kwargs)[source]¶
Generate a derivation with a chosen method.
- Parameters:
method (
str
, optional) – Name of the method that is used for creating a derivation.Possible values: See
generate_derivation_tree
.derivation_order (
str
, optional) – Order in which nonterminals are expanded during the step-by-step derivation.Possible values:
"leftmost"
: Expand the leftmost nonterminal in each step."rightmost"
: Expand the rightmost nonterminal in each step."random"
: Expand a random nonterminal in each step.
newline (
bool
, optional) – IfTrue
, the derivation steps are placed on separate lines by adding newline characters between them.**kwargs (
dict
, optional) – Further keyword arguments are forwarded to the chosen method.
- Returns:
derivation (
str
) – The derivation generated with the chosen method. It is a text that shows each step of the derivation as it is commonly represented in textbooks.
Notes
This method uses
generate_derivation_tree
to create a derivation tree and then reads the leftmost derivation contained in it withderivation
.
- generate_string(self, method='weighted', *args, separator='', **kwargs)[source]¶
Generate a string with a chosen method.
Each string [5] is part of the language L(G) defined by grammar G.
- Parameters:
method (
str
, optional) – Name of the method that is used for creating a string.Possible values: See
generate_derivation_tree
.separator (
str
, optional) – A short string that is inserted between each of the terminal symbols that make up the string of the grammar’s language.**kwargs (
dict
, optional) – Further keyword arguments are forwarded to the chosen method.
- Returns:
string (
str
) – The string generated with the chosen method. It is a text that consists only of terminal symbols and optionally a separator symbol between them.
Notes
This method uses
generate_derivation_tree
to create a derivation tree and then reads the string contained in it from the leaf nodes withstring
.References
- generate_language(self, max_steps=None, sort_order='discovered', verbose=None, return_details=None)[source]¶
Generate the formal language defined by the grammar.
This algorithm recursively constructs the formal language [6] defined by the grammar, i.e. the set of strings it can generate or recognize.
The algorithm can be stopped prematurely to get only a subset of the entire language. By default, only the language of the start symbol is returned, which is the language of the grammar, but it is also possible to return the languages of each other nonterminal symbol, which can be thought of as sublanguages. For example, many programming languages are defined by a CFG and have a nonterminal symbol that represents all valid integer literals in that language.
- Parameters:
max_steps (
int
, optional) – The maximum number of recursive steps during language generation. It can be used to stop the algorithm before it can construct all strings of the language. Instead a list of valid strings found so far will be returned, which is a subset of the entire language. This is necessary to get a result if the grammar defines a very large or infinite language.Note that each recursive step uses the strings known so far and inserts them into the right-hand sides of production rules to see if any new strings can be discovered. Therefore simpler strings are found before more complex ones that require more expansions. If the number of steps is too little to form a single string belonging to the language of the start symbol, the result will be an empty list.
sort_order (
str
, optional) – The language is returned as a list of strings, which can be sorted in different ways.Possible values:
"discovered"
: Strings are returned in the order they were discovered."lex"
: Lexicographic, i.e. the order used in lexicons, which means the alphabetic order extended to non-alphabet characters like numbers. Python’s built-insort()
function delivers it by default."shortlex"
: Strings are sorted primarily by their length. Those with the same length are further sorted in lexicographic order.
verbose (
bool
, optional) – IfTrue
, detailed messages are printed during language generation. IfFalse
, no output is generated.return_details (
bool
, optional) – IfTrue
, the return value is adict
with nonterminals as keys and their languages as values. The language of the start symbol is the language of the grammar, but each nonterminal has its own sub-language that can be of interest too.
- Returns:
language (
list
ofstr
, ordict
withlist
ofstr
as values) – The formal language L(G) defined by the grammar G. If the argumentreturn_details
is set toTrue
, the return value is adict
where each key is a nonterminal of the grammar and each value the language (set of strings) of the nonterminal.- Warns:
GrammarWarning – If no value is provided for the argument
max_steps
, internally an unrealistically large value of is assigned to it. In the unlikely case this is ever reached, a warning will be raised if the language generation did not generate all strings of the language.
References
- recognize_string(self, string, parser='earley')[source]¶
Test if a string belongs to the language of the grammar.
- Parameters:
string (
str
) – Candidate string which can be recognized to be a member of the grammar’s language.parser (
str
, optional) – Parsing algorithm used to analyze the string.Possible values: See
parse_string
- Returns:
recognized (
bool
) –True
if the string belongs to the grammar’s language,False
if it does not.
- parse_string(self, string, parser='earley', get_multiple_trees=False, max_num_trees=None)[source]¶
Try to parse a given string with a chosen parsing algorithm.
Parsing means the analysis of a string into its parts or constituents [7].
- Parameters:
string (
str
) – Candidate string which can only be parsed successfully if it is a member of the grammar’s language.parser (
str
, optional) – Parsing algorithm used to analyze the string. This package uses parsers from the package Lark [8].Possible values:
"earley"
: Can parse any context-free grammar.Performance: The algorithm has a time complexity of
O(n^3)
, but if the grammar is unambiguous it is reduced toO(n^2)
and for most LR grammars it isO(n)
."lalr"
: Can parse only a subset of context-free grammars, which have a form that allows very efficient parsing with a LALR(1) parser.Performance: The algorithm has a time complexity of
O(n)
.
get_multiple_trees (
bool
, optional) – IfTrue
, a list of parse trees, also known as parse forest, is returned instead of a single parse tree object.max_num_trees (
int
, optional) – An upper limit on how many parse trees will be returned at maximum.
- Returns:
parse_tree (
DerivationTree
, orlist
ofDerivationTree
) – If the argumentget_multiple_trees
is set toTrue
, a list of derivation trees is returned instead of a single derivation tree object. The list can contain one or more trees, dependening on how many ways there are to parse the given string. If a grammar is unambiguous, there is only one way. If it is ambuguous, there can be multiple ways to derive the same string in a leftmost derivation, which is captured by different derivation trees.- Raises:
ParserError – If the string does not belong to the language and therefore no parse tree exists for it.
Notes
If the grammar is ambiguous, there can be more than one way to parse a given string, which means that there are multiple parse trees for it. By default, only one of these trees is returned, but the argument
get_multiple_trees
allows to get all of them. Caution: This feature is currently only supported with Lark’s Earley parser.References
- plot(self)[source]¶
Generate a figure containing a syntax diagram of the grammar.
- Returns:
fig (
Figure
) – Figure object containing the plot, allowing to display or export it.
Notes
Syntax diagrams (a.k.a. railroad diagrams) [9] are especially useful for representing EBNF specifications of a grammar, because they capture nicely the extended notations that are introduced by EBNF, e.g. optional or repeated items.
This package supports reading a grammar specification from EBNF text. Internally, however, EBNF is automatically converted to a simpler form during the reading process, which is done by removing any occurrence of extended notation and expressing it with newly introduced symbols and rules instead. Only the final version of the grammar can be visualized, which is essentially BNF with new helper rules and nonterminals. Therefore the expressive power of syntax diagrams is unfortunately not fully used here.
There are many websites with explanations [10] or examples [11] of syntax diagrams. Likewise, there are many libraries [12] for generating syntax diagrams. This package uses the library Railroad-Diagram Generator [13].
References
- class alogos.EvolutionaryAlgorithm(grammar, objective_function, objective, system='cfggpst', evaluator=None, **kwargs)[source]¶
Evolutionary algorithm for searching an optimal string in a grammar’s language.
The role of this class is explained by first introducing basic concepts from optimization theory and the field of grammar-guided genetic programming. Then the usage of this evolutionary algorithm is explained.
Concepts from optimization theory
Optimization: In general terms, optimization means to search for the best item out of many alternatives, or in other words, an optimal solution out of many candidate solutions.
Search space: The set of all candidate solutions is called the search space. It can have a finite or infinite number of elements and there can be mathematical relationships between them, which can be used to efficiently sample the space.
Search goal: Having a best or optimal item in the space requires that there is an order on the items. Such an order can be established in different ways, but the most common one is to define an objective function, which takes an item as input and returns a number as output. This number represents a quality score or fitness of the item and puts a weak order on all items, which means that some of them can be equally good and therefore multiple optimal items may exist that share the best quality score.
Search method: There are many optimization algorithms that work very efficiently on specific search spaces with certain mathematical structure. Methods that are more general and can act on a large variety of search spaces are sometimes called metaheuristics. They are usually stochastic optimization algorithms, which means they have a random element to them and that implies that running them several times can produce different results, i.e. find candidate solutions of different quality. They don’t come with any guarantee to find a good or optimal solution, but often can successfully do so in hard optimization problems and are therefore often used in practice when the search space can not be tackled with exact methods. An evolutionary algorithm is a prominent example of a population-based metaheuristic, which was originally inspired by the process of natural evolution. Genetic programming is one subdiscipline in the field of evolutionary computation that deals with optimization in spaces of programs. Grammar-guided genetic programming (G3P) is one flavor of it that allows to define the search space with a context-free grammar, which is very often used in computer science to define various kinds of formal languages, including modern programming languages like Python or Rust.
Concepts from grammar-guided genetic programming
A grammar can be used to define a formal language.
A formal language is a finite or infinite set of string.
A search algorithm can try to find the best item in a set according to a given search goal.
An objective function can define a search goal by assigning a score to each item in a set, so that the item with minimum or maximum score are considered the best ones and need to be found.
This class provides a metaheuristic search method known as evolutionary algorithm, which was tailored for use with different grammar-guided genetic programming systems such as Grammatical Evolution (GE) and Context-Free Grammar Genetic Programming (CFG-GP). It uses following concepts that are directly relevant for the usage of this class:
Individual: An individual represents a single candidate solution and comes with a genotype, phenotype and fitness. The genotype can be modified with variation operators to give rise to new closely related genotypes and thereby move through the search space. It also allows to derive a phenotype, which is an element of the search space that in the case of grammar-guided genetic programming is simply a string of the grammar’s language. The fitness is a number that indicates the quality of a candidate solution and is calculated by a user-provided objective function.
Population: A population is a collection of individuals.
Generation: A generation is one of several consecutive populations used throughout a search with an evolutionary algorithm. The search begins with the creation of a random initial population and proceeds by generating one population after the other, each based on the previous one. The idea is that the average fitness of the individuals in the generations can be increased by using variation operators to generate new, closely related individuals together with selection operators that introduce a preference for individuals with higher fitness. This is done by
Parent selection to choose individuals that are allowed to create offspring.
Crossover to create new individuals by mixing the genotypes of parent individuals.
Mutation to introduce slight random variations into the offspring individuals.
Survivor selection to choose individuals that are allowed to continue into the next generation.
All of these steps can be performed by different operators and this class provides several well-known ones to modify how the search is performed.
- __init__(self, grammar, objective_function, objective, system='cfggpst', evaluator=None, **kwargs)[source]¶
Create and parameterize an evolutionary algorithm.
- Parameters:
grammar (
Grammar
) – Context-free grammar that defines the search space. A grammar specifies a formal language, which is a finite or infinite set of strings. Each item of the search space is simply a string of the grammar’s language. The aim is to find an optimal string according to a user-provided objective.objective_function (
Callable
) – A function that gets an item of the search space as input and needs to return a numerical value for it as output. In other words, it gets a phenotype and returns a fitness value for it that represents how good it is.Given that the search space is a grammar’s language, each candidate solution or phenotype is a string. Therefore a very simple objective function could just measure the length of each string and return that as fitness value:
def obj_fun(string): fitness = len(string) return fitness
More realistic objective functions will usually evaluate the string in some way and measure how well it performs on a task.
objective (
str
) – Goal of the optimization, which is to find a candidate solution that has either the minimum or maximum value assigned to it by the objective function.Possible values:
"min"
: Minimization, i.e. looking for a candidate solution with the smallest possible fitness value."max"
: Maximization, i.e. looking for a candidate solution with the highest possible fitness value.
system (
str
, optional) – Grammar-guided gentic programming system used by the evolutionary algorithm.Possible values:
"cfggp"
: CFG-GP, seecfggp
."cfggpst"
: CFG-GP-ST, seecfggpst
."dsge"
: DSGE, seedsge
."ge"
: GE, seege
."pige"
: piGE, seepige
."whge"
: WHGE, seewhge
.
The system provides following components to the search algorithm:
Genotype representation: This is a system-specific, indirect representation of a candidate solution, e.g. a list of int, a bitarray, or a tree.
Genotype-phenotype mapping: This is used to translate a genotype to a phenotype. Note that a genotype is system-specific, while a phenotype is a string of the grammar’s language.
Random variation operators: These are used to move through the search space by generating new genotypes that are close or related to the original ones. This is achieved by randomly introducing slight changes into a genotype (mutation) or recombining multiple genotypes into new ones (crossover).
evaluator (
Callable
, optional) – A function that gets a function together with a list of items as input and needs to return a list of new items as output. The task of this function is to evaluate the given function for each item in the list, and collect the return values in a new list. All function calls are assumed to be independent and may be computed serially, in parallel or in a distributed fashion.**kwargs (
dict
, optional) – Further keyword arguments are forwarded either to the evolutionary algorithm or to the chosensystem
. This allows to control the search process in great detail. Every parameter can also be changed later on via the attributeparameters
of the generated instance. This can be done before starting a run with therun
method, but also throughout a run when it is performed one step at a time by repeatedly calling thestep
method.For a description of available keyword arguments, see
Parameters for the evolutionary algorithm itself:
default_parameters
Parameters of the chosen G3P system:
CFG-GP:
default_parameters
CFG-GP-ST:
default_parameters
DSGE:
default_parameters
piGE:
default_parameters
WHGE:
default_parameters
For example, when the chosen system is GE, the creation and parametrization of the evolutionary algorithm could look like this:
import alogos as al grammar = al.Grammar("<S> ::= 1 | 22 | 333") def obj_fun(string): return len(string) ea = al.EvolutionaryAlgorithm( grammar, obj_fun, "max", system="ge", max_generations=10, max_wraps=2) ea.run()
Note that in this example
max_generations
is a parameter of the evolutionary algorithm itself and therefore it can be used no matter which system is chosen. In contrast,max_wraps
is a parameter specific to the system GE and would not be accepted if for example the system CFG-GP were chosen by passingsystem="cfggp"
instead ofsystem="ge"
.
- Raises:
ParameterError – If a parameter is provided that is not known to the evolutionary algorithm or to the chosen system. A list of available parameters and their default values will be printed.
- step(self)[source]¶
Perform a single step of the evolutionary search.
Taking a single step means that one generation is added, which can happen in one of two ways:
If the run has just started, the first generation is created by initializating a population.
If the run is already ongoing, a new generation is derived from the current generation by a process that involves 1) parent selection that prefers fitter individuals, 2) generating an offspring population by applying random variation operators (mutation, crossover) to the parents and 3) survivor selection that again prefers fitter individuals.
Two important notes about the behavior of this method:
It contrast to the
run
method, it does not check if any stop criterion has been met! For example, when the number of generations reaches the value provided by themax_generations
parameter, calling this method will regardless add another generation.Since this method has to be called repeatedly to perform the individual steps of a run, it enables to change the parameters of the evolutionary algorithm on the fly by modifying the
parameters
attribute between calls. This means the search can be adjusted in much greater detail than with therun
method. For example, the mutation and crossover rates can be gradually modified during a run to reduce the amount of variation when getting closer to the goal. It is also possible to change the population size or even the objective function. Fixed throughout a run, however, are the grammar (=search space) and the G3P system (=internal representation).
- Returns:
best_individual (
Individual
of the chosen G3P system) – The individual with the best fitness found so far in the entire search. This means the individual may have been discovered not in the current step but in an earlier one.
- run(self)[source]¶
Perform one search step after another until a stop criterion is met.
A user can set a single or multiple stop criteria. The algorithm will halt when one of them is fulfilled. Caution: If the criteria are set in such a way that they can never be realized, e.g. because an unreachable fitness threshold was chosen, then the run will not halt until it is interrupted from the outside.
Note that after a search has halted, it is possible to change the stop criteria and call
run
again to continue the search. This can also be done multiple times.- Returns:
best_individual (
Individual
of the chosen G3P system) – The individual with the best fitness found so far in the entire search.- Raises:
ParameterError – If no stop criterion was set. This prevents starting a run that clearly would not halt.
- Warns:
OptimizationWarning – If not a single step was taken, because some stop criterion was already fulfilled by the current search state.
- is_stop_criterion_met(self)[source]¶
Check if the current search state meets any stop criterion.
This method is used internally by the
run
method. It can also be useful when constructing a specialized search that repeatedly calls thestep
method, where it is the responsibility of the user to ensure a stop.- Returns:
stop (bool) –
True
if a stop criterion is met,False
otherwise.
- reset(self)[source]¶
Reset the current search state to enable a fresh start.
Not all aspects of the algorithm object are simply set back to their initial state:
Parameters are not reset to their default values. If any non-default parameters were passed during object-creation or if they were set to user-provided values later via the
parameters
attribute, then these values are kept and not overwritten by default values. The purpose of this behavior is to enable repeated runs that are directly comparable.If a database was used, it is reset, but what that means depends on its location:
If the database location is in memory, the content is entirely lost during a reset.
If the database location is a file, it is renamed during a reset and a new file is created. This means the content of a run is not lost but relocated.
Cached values are fully deleted. This means previous phenotype-to-fitness evaluations will not be reused in a follow-up run.