Calendar facts by xkcd

This notebook implements a neat example of a context-free grammar defined in Extended Backus-Naur Form and shows how it can be used to generate strings. The webcomic https://xkcd.com/1930 contains a syntax diagram (railroad diagram) that describes a grammar for generating “calendar facts”. The website also shows a tooltip, when hovering over the image with the cursor, which adds another sentence at the end of the grammar.

[1]:
import alogos as al

Specify the grammar

Use a text in Extended Backus-Naur Form to capture the syntax diagram.

[2]:
ebnf_text = """
START = "Did you know that " SUBJECT PREDOBJ "because of " REASON "? Apparently " EXPLAN CONSEQ

SUBJECT = "the " ("fall " | "spring ") "equinox "
        | "the " ("winter " | "summer ") ("solstice " | "olympics ")
        | "the " ("earliest " | "latest ") ("sunrise " | "sunset ")
        | "daylight " ("saving " | "savings ") "time "
        | "leap " ("day " | "year ")
        | "easter "
        | "the " ("harvest " | "super " | "blood ") "moon "
        | "Toyota truck month "
        | "shark week "

PREDOBJ = "happens " ("earlier " | "later " | "at the wrong time ") "every year "
        | "drifts out of sync with the " H1
        | "might " ("not happen " | "happen twice ") "this year "
H1 = "sun "
   | "moon "
   | "zodiac "
   | ("gregorian " | "mayan " | "lunar " | "iPhone ") "calendar "
   | "atomic clock in Colorado "

REASON = "time zone legislation in " ("Indiana" | "Arizona" | "Russia")
       | "a decree by the pope in the 1500s"
       | H2 "of the " H3
       | "magnetic field reversal"
       | "an arbitrary decision by " ("Benjamin Franklin" | "Isaac Newton" | "FDR")
H2 = "precession " | "libration " | "nutation " | "libation " | "eccentricity " | "obliquity "
H3 = "moon" | "sun" | "earth's axis" | "equator" | "prime meridian"
   | ("international date " | "Mason-Dixon ") "line"

EXPLAN = "it causes a predictable increase in car accidents. "
       | "that's why we have leap seconds. "
       | "scientists are really worried. "
       | "it was even more extreme during the " ("bronze age. " | "ice age. " | "cretacedus. " | "1990s. ")
       | "there's a proposal to fix it, but it " H4
       | "it's getting worse and no one knows why. "
H4 = "will never happen. "
   | "actually makes things worse. "
   | "is stalled in congress. "
   | "might be unconstitutional. "

CONSEQ = "While it may seem like trivia, it " H5
H5 = "causes huge headaches for software developers."
   | "is taken advantage of by high-speed traders."
   | "tiggered the 2003 Northeast Blackout."
   | "has to be corrected for by GPS satellites."
   | "is now recognized as a major cause of World War I."
"""

grammar = al.Grammar(ebnf_text=ebnf_text)

Use the grammar generatively

a) Generate random strings of the grammar’s language

[3]:
print('Some random strings and their length:')
print()
for _ in range(5):
    string = grammar.generate_string()
    print(len(string))
    print(string)
    print()
Some random strings and their length:

249
Did you know that the latest sunset drifts out of sync with the mayan calendar because of an arbitrary decision by Isaac Newton? Apparently scientists are really worried. While it may seem like trivia, it is taken advantage of by high-speed traders.

235
Did you know that easter happens earlier every year because of time zone legislation in Indiana? Apparently it causes a predictable increase in car accidents. While it may seem like trivia, it has to be corrected for by GPS satellites.

266
Did you know that shark week drifts out of sync with the atomic clock in Colorado because of time zone legislation in Arizona? Apparently it was even more extreme during the bronze age. While it may seem like trivia, it causes huge headaches for software developers.

261
Did you know that shark week happens earlier every year because of an arbitrary decision by Isaac Newton? Apparently there's a proposal to fix it, but it might be unconstitutional. While it may seem like trivia, it causes huge headaches for software developers.

236
Did you know that the fall equinox drifts out of sync with the zodiac because of a decree by the pope in the 1500s? Apparently scientists are really worried. While it may seem like trivia, it is taken advantage of by high-speed traders.

b) Generate all strings of the grammar’s language

  • For a finite language, as it is the case here, it is possible to generate all strings.

  • For an infinite language the construction process needs to be limited with max_steps to only get simple strings that can be generated with a few derivation steps from the start symbol.

[4]:
language = grammar.generate_language()
shortest_string = min(language, key=len)
longest_string = max(language, key=len)
print('The grammar describes a formal language consisting of {} strings.'.format(len(language)))

print()
print('Shortest string with {} characters:'.format(len(shortest_string)))
print(shortest_string)

print()
print('Longest string with {} characters:'.format(len(longest_string)))
print(longest_string)
The grammar describes a formal language consisting of 780000 strings.

Shortest string with 195 characters:
Did you know that easter happens later every year because of nutation of the sun? Apparently scientists are really worried. While it may seem like trivia, it tiggered the 2003 Northeast Blackout.

Longest string with 310 characters:
Did you know that daylight savings time drifts out of sync with the atomic clock in Colorado because of eccentricity of the international date line? Apparently there's a proposal to fix it, but it actually makes things worse. While it may seem like trivia, it is now recognized as a major cause of World War I.

c) Search for certain strings with an evolutionary algorithm

Grammar-guided genetic programming allows to search for optimal strings within a finite or infinite language. An objective function defines what is optimal. It takes a string as input (a member of the language) and returns a number as output (the objective value or fitness value of that string).

[5]:
def objective_function(string):
    return len(string)
[6]:
ea = al.EvolutionaryAlgorithm(grammar, objective_function, 'min', max_generations=50)
best_individual = ea.run()

string = best_individual.phenotype
print('A short string with {} characters:'.format(len(string)))
print(string)
A short string with 195 characters:
Did you know that easter happens later every year because of nutation of the sun? Apparently scientists are really worried. While it may seem like trivia, it tiggered the 2003 Northeast Blackout.
[7]:
ea = al.EvolutionaryAlgorithm(grammar, objective_function, 'max', max_generations=50)
best_individual = ea.run()

string = best_individual.phenotype
print('A long string with {} characters:'.format(len(string)))
print(string)
A long string with 310 characters:
Did you know that daylight savings time drifts out of sync with the atomic clock in Colorado because of eccentricity of the international date line? Apparently there's a proposal to fix it, but it actually makes things worse. While it may seem like trivia, it is now recognized as a major cause of World War I.