This notebook is an example of symbolic regression, i.e. the search for an algebraic expression that fits well to given data points. It evolves an Atomese program that implements a function in form of a LambdaLink, which takes two values as input (x, y) and returns one value (z) as output. The goal is to minimize the deviation from given data points (x, y, z).
Wikipedia
OpenCog Wiki
cog-execute!
called execute_atom
import time
import warnings
import alogos as al
import mevis as mv
import numpy as np
import unified_map as um
import unified_plotting as up
from opencog.bindlink import execute_atom
from opencog.type_constructors import *
from opencog.utilities import set_default_atomspace
A function f
with two input variables x
and y
and one output variable z
is used to generate a list of data points (x, y, z)
. This is achieved by evaluating the function z = f(x, y)
on a list of (x, y)
pairs, which are chosen to lie on a regular grid.
As function was chosen the Pagie-1 polynomial, because it was suggested for use in benchmarking in the article Better GP benchmarks: community survey results and proposals.
def pagie1_polynomial(x, y):
return 1.0/(1.0+x**-4) + 1.0/(1.0+y**-4)
n = 24
a, b = -5, +5
xy_grid = np.array([(xi, yj) for xi in np.linspace(a, b, n) for yj in np.linspace(a, b, n)])
xa = np.array([el[0] for el in xy_grid])
ya = np.array([el[1] for el in xy_grid])
za = np.array([pagie1_polynomial(xi, yi) for xi, yi in xy_grid])
print('Generated {} data points in the form of (x, y, z) triplets.'.format(len(za)))
up.plotly.scatter_3d(xa, ya, za, marker_color='black', marker_size=3)
This grammar defines the search space: an Atomese program that creates an atom that represents a function in form of a LambdaLink and that can be evaluated with a given (x, y) input to give a z output.
ebnf_text = """
ATOMESE_PROGRAM = L0 NL L1 NL L2 NL L3
L0 = "func_atom = LambdaLink("
L1 = " VariableList(VariableNode('$x'), VariableNode('$y')),"
L2 = " " EXPR
L3 = ")"
NL = "\n"
EXPR = OP "(" EXPR ", " EXPR ")" | VAR | CONST
OP = "PlusLink" | "MinusLink" | "TimesLink" | "DivideLink"
VAR = "VariableNode('$x')" | "VariableNode('$y')"
CONST = "NumberNode('1.0')"
"""
grammar = al.Grammar(ebnf_text=ebnf_text)
The objective function gets a candidate solution (=a string of the grammar's language) and returns a fitness value for it. This is done by 1) executing the string as an Atomese program with OpenCog, so that it creates a function in form of a LambdaLink atom, and then 2) evaluate the lambda function for each (x, y) pair in the given grid and compare the resulting z values with the target z values: the smaller the deviation, the better is the candidate.
def string_to_func_atom(atomspace, string):
local_vars = dict(atomspace=atomspace)
exec(string, None, local_vars)
func_atom = local_vars['func_atom']
return func_atom
def evaluate_func_atom_scalar(atomspace, func_atom, x, y):
x_atom = NumberNode(str(x))
y_atom = NumberNode(str(y))
input_atom = ListLink(x_atom, y_atom)
exec_atom = ExecutionOutputLink(func_atom, input_atom)
result_atom = execute_atom(atomspace, exec_atom)
result_float = float(result_atom.name)
return result_float
def evaluate_func_atom_vectorial(atomspace, func_atom, x_vec, y_vec):
x_atom = NumberNode(' '.join(str(xi) for xi in x_vec))
y_atom = NumberNode(' '.join(str(yi) for yi in y_vec))
input_atom = ListLink(x_atom, y_atom)
exec_atom = ExecutionOutputLink(func_atom, input_atom)
result_atom = execute_atom(atomspace, exec_atom)
result_floats = [float(el) for el in result_atom.name.split()]
return result_floats
def objective_function_scalar(string):
if len(string) > 5000:
return float('inf')
atomspace = AtomSpace()
set_default_atomspace(atomspace)
func_atom = string_to_func_atom(atomspace, string)
zf = [evaluate_func_atom_scalar(atomspace, func_atom, x, y) for x, y in xy_grid]
sse = np.sum((zf - za)**2)
return sse
def objective_function_vectorial(string):
if len(string) > 5000:
return float('inf')
atomspace = AtomSpace()
set_default_atomspace(atomspace)
func_atom = string_to_func_atom(atomspace, string)
zf = evaluate_func_atom_vectorial(atomspace, func_atom, xa, ya)
sse = np.sum((zf - za)**2)
return sse
Check if grammar and objective function work as intended.
random_string = grammar.generate_string()
print(random_string)
objective_function_vectorial(random_string)
report_time = True
display_results = False
num_gen = 300
num_ind = 600
ea = al.EvolutionaryAlgorithm(
grammar, objective_function_vectorial, 'min',
max_generations=num_gen,
max_or_min_fitness=0.001,
max_runtime_in_seconds=300,
population_size=num_ind, offspring_size=num_ind,
evaluator=um.univariate.parallel.futures,
verbose=1,
max_nodes=500,
)
The search is performed one generation after another and some intermediate results are reported to see how the solutions improve gradually.
best_ind = ea.run()
Show the phenotype of the best individual found so far. If more computing time is provided, increasingly better solutions may be discovered.
string = best_ind.phenotype
print(string)
atomspace = AtomSpace()
set_default_atomspace(atomspace)
func_atom = string_to_func_atom(atomspace, string)
mv.plot(atomspace, layout_method='dot')
zf = evaluate_func_atom_vectorial(atomspace, func_atom, xa, ya)
up.plotly.scatter_3d(xa, ya, za, marker_color='black', marker_size=3) + \
up.plotly.surface(xa, ya, zf, show_colormap=False)