This notebook is an example of symbolic regression, i.e. the search for an algebraic expression that fits well to given data points. It evolves a Python expression in order to implement a function, which takes two values as input (x, y) and returns one value (z) as output. The goal is to minimize the deviation from given data points (x, y, z).
import time
import warnings
import alogos as al
import numpy as np
import unified_map as um
import unified_plotting as up
from numpy import sin, cos, exp, log, sqrt
warnings.filterwarnings('ignore') # ignore numerical warnings like division by zero
A function f
with two input variables x
and y
and one output variable z
is used to generate a list of data points (x, y, z)
. This is achieved by evaluating the function z = f(x, y)
on a list of (x, y)
pairs, which are chosen to lie on a regular grid.
As function was chosen the Pagie-1 polynomial, because it was suggested for use in benchmarking in the article Better GP benchmarks: community survey results and proposals.
def pagie1_polynomial(x, y):
return 1.0/(1.0+x**-4) + 1.0/(1.0+y**-4)
n = 30
a, b = -5, +5
xy_grid = np.array([(xi, yj) for xi in np.linspace(a, b, n) for yj in np.linspace(a, b, n)])
xa = np.array([el[0] for el in xy_grid])
ya = np.array([el[1] for el in xy_grid])
za = np.array([pagie1_polynomial(xi, yi) for xi, yi in xy_grid])
print('Generated {} data points in the form of (x, y, z) triplets.'.format(len(za)))
up.plotly.scatter_3d(xa, ya, za, marker_color='black', marker_size=3)
This grammar defines the search space: an Atomese program that creates an atom that represents a function in form of a LambdaLink and that can be evaluated with a given (x, y) input to give a z output.
ebnf_text_1 = """
EXPR = "(" EXPR OP EXPR ")" | PREOP "(" EXPR ")" | VAR | CONST
OP = "+" | "-" | "*" | "/"
PREOP = "sin" | "cos" | "exp" | "log" | "sqrt"
VAR = "x" | "y"
CONST = DIGIT "." DIGIT
DIGIT = "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
"""
ebnf_text_2 = """
EXPR = "(" EXPR OP EXPR ")" | VAR | CONST
OP = "+" | "-" | "*" | "/"
VAR = "x" | "y"
CONST = "1.0"
"""
grammar = al.Grammar(ebnf_text=ebnf_text_2)
The objective function gets a candidate solution (=a string of the grammar's language) and returns a fitness value for it. This is done by 1) evaluating the string as Python expression within a function definition and then 2) evaluating that function for each (x, y) pair in the given grid and compare the resulting z values with the target z values: the smaller the deviation, the better is the candidate.
def string_to_function(string):
def func(x, y):
return eval(string)
return func
def objective_function(string):
if len(string) > 500:
return float('inf')
func = string_to_function(string)
zf = func(xa, ya) # problem: sqrt(-1) leads to complex number and potentially a negative fitness
sse = np.sum((zf - za)**2)
return sse
Check if grammar and objective function work as intended.
random_string = grammar.generate_string()
print(random_string)
objective_function(random_string)
report_time = True
display_results = False
num_gen = 300
num_ind = 600
ea = al.EvolutionaryAlgorithm(
grammar, objective_function, 'min',
max_generations=num_gen,
max_or_min_fitness=0.001,
max_runtime_in_seconds=300,
population_size=num_ind, offspring_size=num_ind,
evaluator=um.univariate.parallel.futures,
verbose=True,
max_nodes=500,
)
The search is performed one generation after another and some intermediate results are reported to see how the solutions improve gradually.
best_ind = ea.run()
Show the phenotype of the best individual found so far. If more computing time is provided, increasingly better solutions may be discovered.
string = best_ind.phenotype
func = string_to_function(string)
print(string)
zf = func(xa, ya)
up.plotly.scatter_3d(xa, ya, za, marker_color='black', marker_size=3) + \
up.plotly.surface(xa, ya, zf, show_colormap=False)