Numeric literals in Python 3

[1]:
import alogos as al

1) Create the grammar

Grammar for Python 3 integer literals

Ellipses in the original text need to be replaced by the productions they stand for.

Example: "1"..."9" becomes "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"

[2]:
ebnf = """
integer      ::=  decinteger | bininteger | octinteger | hexinteger
decinteger   ::=  nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")*
bininteger   ::=  "0" ("b" | "B") (["_"] bindigit)+
octinteger   ::=  "0" ("o" | "O") (["_"] octdigit)+
hexinteger   ::=  "0" ("x" | "X") (["_"] hexdigit)+
nonzerodigit ::=  "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
digit        ::=  "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
bindigit     ::=  "0" | "1"
octdigit     ::=  "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7"
hexdigit     ::=  digit | "a" | "b" | "c" | "d" | "e" | "f" | "A" | "B" | "C" | "D" | "E" | "F"
"""

grammar_int = al.Grammar(ebnf_text=ebnf, defining_symbol='::=')

Grammar for Python 3 floating point literals

The rule for the digit nonterminal needs to be copied from the definition of integer literals

[3]:
ebnf = """
floatnumber   ::=  pointfloat | exponentfloat
pointfloat    ::=  [digitpart] fraction | digitpart "."
exponentfloat ::=  (digitpart | pointfloat) exponent
digitpart     ::=  digit (["_"] digit)*
fraction      ::=  "." digitpart
exponent      ::=  ("e" | "E") ["+" | "-"] digitpart

digit         ::=  "0" | "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9"
"""

grammar_float = al.Grammar(ebnf_text=ebnf, defining_symbol='::=')

2) Use the grammar to generate strings

a) Some random strings

[4]:
n = 15
left = 'Examples of integer literals'
right = 'Number when evaluated (with this Python implementation)'
print('{:30} {}'.format(left, right))
print('{:30} {}'.format('-'*len(left), '-'*len(right)))

for i in range(n):
    string = grammar_int.generate_string()
    number = eval(string)
    print('{:30} {}'.format(string, number))
Examples of integer literals   Number when evaluated (with this Python implementation)
----------------------------   -------------------------------------------------------
0B1_1_1_0                      14
0o_5_3                         43
0B1_0                          2
0B_0                           0
0b_1_1                         3
00_0                           0
0                              0
0x_c_bcf_B                     834811
0B_0                           0
0_00                           0
0O3_6_1                        241
0                              0
0b1_00                         4
0x_a_E                         174
0o_4_4                         36
[5]:
left = 'Examples of float literals'
right = 'Number when evaluated (with this Python implementation)'
print('{:30} {}'.format(left, right))
print('{:30} {}'.format('-'*len(left), '-'*len(right)))

for i in range(n):
    string = grammar_float.generate_string()
    number = eval(string)
    print('{:30} {}'.format(string, number))
Examples of float literals     Number when evaluated (with this Python implementation)
--------------------------     -------------------------------------------------------
7.                             7.0
9.                             9.0
5.8e+7                         58000000.0
6_8.3e1                        683.0
9e1                            90.0
3_7.                           37.0
10597.e8                       1059700000000.0
6.                             6.0
9.                             9.0
0E+6                           0.0
6e-6                           6e-06
51_7.E9                        517000000000.0
1_5e-3_9                       1.5e-38
3.2_6e-7                       3.26e-07
5.                             5.0

b) A subset of the grammar’s infinite language

  • For a finite language it is possible to generate all strings.

  • For an infinite language, as it is the case here, the construction process needs to be limited with the parameter max_steps to only get simple strings that can be generated with a few derivation steps from the start symbol.

[6]:
language = grammar_int.generate_language(max_steps=6, sort_order='shortlex') #, verbose=True)

print('{} strings were generated, a subset of the inifinite language of integer literals.'.format(len(language)))
print()
print('Here are some of these strings:')
for i in [0, 1, 10, 11, 100, 101, 1000, 1001, 4000, 4001, 5000, 5001, 6000, 6001, -2, -1]:
    string = language[i]
    print(string)
6592 strings were generated, a subset of the inifinite language of integer literals.

Here are some of these strings:
0
1
00
10
99
000
863
864
0X_Bd
0X_Be
0xf_2
0xf_3
0X_A_8
0X_A_9
000_0_0
0000_0_0
[7]:
language = grammar_float.generate_language(max_steps=6, sort_order='shortlex')

print('{} strings were generated, a subset of the inifinite language of float literals.'.format(len(language)))
print()
print('Here are some of these strings:')
for i in [0, 1, 10, 11, 100, 101, 1000, 1001, 1201, 1202, -2, -1]:
    string = language[i]
    print(string)
1520 strings were generated, a subset of the inifinite language of float literals.

Here are some of these strings:
.0
.1
0.
1.
2.0
2.1
8E+0
8E+1
2.E+1
2.E+2
9.e-8
9.e-9

3) Use the grammar to parse strings

a) Recognize if a string belongs to the language defined by a grammar

[8]:
# Examples from website
examples_int = [
    '7', '2147483647', '0o177', '0b100110111', '3', '79228162514264337593543950336',
    '0o377', '0xdeadbeef', '100_000_000_000', '0b_1110_0101']

# Counterexamples by myself
counterexamples_int = ['007', '1__0', '0o8', '0xEFG']
[9]:
left = 'Candidate string'
right = 'Is it recognized as valid Python integer literal?'
print('{:30} {}'.format(left, right))
print('{:30} {}'.format('-'*len(left), '-'*len(right)))

for string in examples_int + counterexamples_int:
    is_recognized = grammar_int.recognize_string(string)
    print('{:30} {}'.format(string, is_recognized))
Candidate string               Is it recognized as valid Python integer literal?
----------------               -------------------------------------------------
7                              True
2147483647                     True
0o177                          True
0b100110111                    True
3                              True
79228162514264337593543950336  True
0o377                          True
0xdeadbeef                     True
100_000_000_000                True
0b_1110_0101                   True
007                            False
1__0                           False
0o8                            False
0xEFG                          False
[10]:
# Examples from website
examples_float = ['3.14', '10.', '.001', '1e100', '3.14e-10', '0e0', '3.14_15_93']

# Counterexamples by myself
counterexamples_float = ['3.1.4', '10.0_', '1e_100']
[11]:
left = 'Candidate string'
right = 'Is it recognized as valid Python floating-point literal?'
print('{:30} {}'.format(left, right))
print('{:30} {}'.format('-'*len(left), '-'*len(right)))

for string in examples_float + counterexamples_float:
    is_recognized = grammar_float.recognize_string(string)
    print('{:30} {}'.format(string, is_recognized))
Candidate string               Is it recognized as valid Python floating-point literal?
----------------               --------------------------------------------------------
3.14                           True
10.                            True
.001                           True
1e100                          True
3.14e-10                       True
0e0                            True
3.14_15_93                     True
3.1.4                          False
10.0_                          False
1e_100                         False

b) Analyze the syntactic structure of a string with its parse tree

[12]:
parse_tree = grammar_int.parse_string('0xC3')
parse_tree
[12]:
%30integer1hexinteger0->1201->23hexinteger_§01->34hexinteger_§31->45x3->56hexinteger_§34->67hexinteger_§24->78hexinteger_§26->813decinteger_§07->1314hexdigit7->149decinteger_§08->910hexdigit8->1011ɛ9->1112C10->1215ɛ13->1516digit14->1617316->17
[13]:
parse_tree = grammar_float.parse_string('3.14')
parse_tree.plot(shape_nt='hexagon', fillcolor_nt='lightblue', shape_t='circle', fillcolor_t='black')
[13]:
%30floatnumber1pointfloat0->12pointfloat_§01->23fraction1->34digitpart2->49.3->910digitpart3->105digit4->56digitpart_§24->6735->78ɛ6->811digit10->1112digitpart_§210->1213111->1314digitpart_§212->1415digitpart_§112->1516ɛ14->1617digitpart_§015->1718digit15->1819ɛ17->1920418->20

Optimization

Grammar-guided genetic programming allows to search for optimal strings within a finite or infinite language. An objective function defines what is optimal. It takes a string as input (a member of the language) and returns a number as output (the objective value or fitness value of that string).

Aim here: Try to find a float literal of length 4 that evaluates to the highest possible number.

[14]:
def objective_function(string):
    if len(string) != 4:
        raise ValueError('Undesired string, wrong length.')
    number = eval(string)
    return number
[15]:
# Test whether the function creates the expected numerical output
print(objective_function('1234'))
print(objective_function('1_01'))
print(objective_function('3.e7'))
1234
101
30000000.0
[16]:
ea = al.EvolutionaryAlgorithm(grammar_float, objective_function, 'max', max_generations=50)
best_individual = ea.run()
best_individual.phenotype
[16]:
'9E99'