alogos._grammar.normalization.chomsky

Functions

is_cnf(grammar)

Check if the grammar is in Chomsky Normal Form (CNF).

to_cnf(grammar)

Convert the grammar G into Chomsky Normal Form (CNF).


Detailed object descriptions

alogos._grammar.normalization.chomsky.is_cnf(grammar)[source]

Check if the grammar is in Chomsky Normal Form (CNF).

alogos._grammar.normalization.chomsky.to_cnf(grammar)[source]

Convert the grammar G into Chomsky Normal Form (CNF).

Notes

CNF is defined unambiguously, but there are different algorithms to transform a grammar into a form that meets those criteria. These algorithms have varying time complexities and also lead to different growth of the grammar. Here a version is implemented where the new grammar’s size is bounded by O(|G|^2) instead of O(2^2^|G|), adhering mainly to the presentation in a textbook by Elaine Rich.

References

  • Websites

  • Papers

    • Chomsky - On Certain Formal Properties of Grammars (1959)

      • An implication on pp. 149-150: “Theorem 5 asserts in particular that all type 2 languages can be generated by grammars which yield only trees with no more than two branches from each node.”

    • Lange, Leiß - To CNF or not to CNF? An efficient yet presentable version of the CYK algorithm (2009)

      • pp. 5-6: “the order in which the single transformation steps are carried out should be: BIN→DEL→UNIT. This will yield a grammar of size O(|G|^2). Nevertheless, many textbooks choose a different order, namely DEL→UNIT→BIN which yields agrammar of size O(2^2^|G|).”

      • Table 3 on p. 7 gives an overview of textbooks that show how to transform a grammar into CNF. It indicates whether the transformation results in a grammar with size bound by O(|G|^2) or O(2^2^|G|). Based on this table, the textbook by Elaine Rich was chosen here as reference. The textbook by Hopcroft and Ullman is also mentioned because it is very well known and provides a detailed discussion of CNF, yet not a transformation with O(|G|^2) guarantee.

  • Books

    • Rich - Automata, Computability and Complexity (2007): pp. 171-175

      • p. 171: “There exists a straightforward four-step algorithm that converts a grammar […] into a new grammar […] in Chomsky normal form and L(G_C) = L(G) - {ε}”

        1. removeEps: removing from G all ε-rules

        2. removeUnits: removing from G all unit productions

        3. removeMixed: removing from G all rules whose right-hand sides have length greater than 1 and include a terminal

        4. removeLong: removing from G all rules whose right-hand sides have length greater than 2

      • p. 175: “We will run removeLong as step 1 rather than as step 4. […] With this change, removeEps runs in linear time. […] So, if we change converttoChomsky so that it does step 4 first, its time complexity is O(n^2) and the size of the grammar that it produces is also O(n^2).”

    • Hopcroft, Ullman - Automata theory, language and computation (2004)

      • Preliminary simplications: pp. 261-272

        • Eliminate ε-productions

        • Eliminate unit productions

        • Eliminate useless symbols

      • Transformation to Chomsky Normal Form pp. 272-275

        • Arrange that all bodies of length 2 or more consist only of variables

        • Break bodies of length 3 or more into a cascade of production