Palindromes

This notebook shows how a grammar can be used to specify the language of all palindromes over the alphabet {a, b}. It includes strings that are identical to their reverse, e.g. a, bb, aba, abba, or babab.

References

  • Wikipedia

    • Palindrome: “a word, number, phrase, or other sequence of characters which reads the same backward as forward, such as madam or racecar or the number 10801”.

    • CFG examples: Words concatenated with their reverse

[1]:
import alogos as al

Specify the grammar

Use a text in Backus-Naur form (BNF) to capture the production rules shown on the website. This generates the set of all palindromes over the alphabet {a, b}.

[2]:
bnf_text = """
<S> ::= a<S>a
<S> ::= b<S>b
<S> ::= a | b
"""

grammar = al.Grammar(bnf_text=bnf_text)

Inspect how the grammar was recognized

[3]:
grammar
[3]:
S:
a SS a b SS b a b
[4]:
print(grammar)
Nonterminal symbols:
  0: NT('S')

Terminal symbols:
  0: T('a')
  1: T('b')

Start symbol:
  NT('S')

Production rules:
  0: NT('S') -> T('a') NT('S') T('a')
  1: NT('S') -> T('b') NT('S') T('b')
  2: NT('S') -> T('a')
  3: NT('S') -> T('b')

Use the grammar generatively

a) Generate random strings of the grammar’s language

[5]:
for _ in range(15):
    string = grammar.generate_string()
    print('Random string with {} characters: {}'.format(len(string), string))
Random string with 3 characters: aaa
Random string with 5 characters: aaaaa
Random string with 1 characters: a
Random string with 11 characters: aabaabaabaa
Random string with 5 characters: bbabb
Random string with 1 characters: b
Random string with 3 characters: aaa
Random string with 3 characters: aba
Random string with 1 characters: b
Random string with 1 characters: b
Random string with 1 characters: b
Random string with 1 characters: b
Random string with 3 characters: aaa
Random string with 1 characters: b
Random string with 13 characters: aaabbbabbbaaa

b) Generate a finite subset of the grammar’s infinite language

  • For a finite language it is possible to generate all strings, at least in theory.

  • For an infinite language, as it is the case here, the construction process needs to be limited with max_steps to only get simple strings that can be generated with a few derivation steps from the start symbol.

[6]:
language = grammar.generate_language(max_steps=10, sort_order='shortlex')

shortest_string = min(language, key=len)
longest_string = max(language, key=len)
print('The grammar describes an infinite formal language, '
      'of which {} simple strings were generated.'.format(len(language)))

print()
print('Shortest string with {} characters:'.format(len(shortest_string)))
print(shortest_string)

print()
print('Longest string with {} characters:'.format(len(longest_string)))
print(longest_string)

print()
idx = 42
print('String nr. {} with {} characters:'.format(idx, len(language[idx])))
print(language[idx])
The grammar describes an infinite formal language, of which 2046 simple strings were generated.

Shortest string with 1 characters:
a

Longest string with 19 characters:
aaaaaaaaaaaaaaaaaaa

String nr. 42 with 9 characters:
abbaaabba

c) Search for a long string with an evolutionary algorithm

Grammatical evolution allows to search for optimal strings within a finite or infinite language. An objective function defines what is optimal. It takes a string as input (a member of the language) and returns a number as output (the objective value or fitness value of that string).

[7]:
def objective_function(string):
    if '_' in string:
        # give strings containing a '_' character a low objective value
        # to prevent them from dominating the search
        return 0
    return len(string)


ea = al.EvolutionaryAlgorithm(grammar, objective_function, 'max', max_generations=5)
best_individual = ea.run()

string = best_individual.phenotype
print('A long string with {} characters:'.format(len(string)))
print(string)
A long string with 809 characters:
aaaaaaaaaabbbbaabbbbabbaabbaaabbaabbaaaaababbaaabaaaaabbbbaabbabaabbabaababababbabaaaabaaabbaabaaabbaaaabbabbabbbabaabbaabbbbbbaaabbbbbabaabbaabaaaaaaababbbaabaabbabbaaaaaabaaaaaaabbabbbbaababaaabbbaaabaababbabababbbaaaababbbabbbbbababababaaabbbbbbbbaaabababaaabbbbbbbaaaababababaabaaaabbbababbbbaaaabaabaabbbbaaabbababbbbaaabbbabbababaaaaaabbbbbbbbbaabbabaaaaaaabbbbabbababbbbaaaabaabaaabbbabbbbbabbbbaaaaabbbbabbbbbabbbaaabaabaaaabbbbababbabbbbaaaaaaababbaabbbbbbbbbaaaaaabababbabbbaaabbbbababbaaabbbbaabaabaaaabbbbababbbaaaabaababababaaaabbbbbbbaaabababaaabbbbbbbbaaabababababbbbbabbbabaaaabbbabababbabaabaaabbbaaababaabbbbabbaaaaaaabaaaaaabbabbaabaabbbabaaaaaaabaabbaababbbbbaaabbbbbbaabbaababbbabbabbaaaabbaaabaabbaaabaaaababbabababaababbaababbaabbbbaaaaabaaabbabaaaaabbaabbaaabbaabbabbbbaabbbbaaaaaaaaaa

Use the grammar for parsing

a) Check if a string is part of the language

[8]:
strings = ['a', 'aa', 'aaa', 'aba', 'ab']

for string in strings:
    result = grammar.recognize_string(string)
    print('Is string "{}" part of the language? {}'.format(string, result))
Is string "a" part of the language? True
Is string "aa" part of the language? False
Is string "aaa" part of the language? True
Is string "aba" part of the language? True
Is string "ab" part of the language? False

b) Parse a string

[9]:
parse_tree = grammar.parse_string('baabaab')

Inspect the parse tree

[10]:
print(parse_tree)
(<S>(b<S>(a<S>(a<S>(b)a)a)b))
[11]:
print(parse_tree.to_tree_notation())
<S>
 b
 <S>
  a
  <S>
   a
   <S>
    b
   a
  a
 b
[12]:
parse_tree.plot()
[12]:
%30S1b0->12S0->23b0->34a2->45S2->56a2->67a5->78S5->89a5->910b8->10