alogos._grammar.normalization.chomsky
¶
Functions¶
|
Check if the grammar is in Chomsky Normal Form (CNF). |
|
Convert the grammar G into Chomsky Normal Form (CNF). |
Detailed object descriptions¶
- alogos._grammar.normalization.chomsky.is_cnf(grammar)[source]¶
Check if the grammar is in Chomsky Normal Form (CNF).
- alogos._grammar.normalization.chomsky.to_cnf(grammar)[source]¶
Convert the grammar G into Chomsky Normal Form (CNF).
Notes
CNF is defined unambiguously, but there are different algorithms to transform a grammar into a form that meets those criteria. These algorithms have varying time complexities and also lead to different growth of the grammar. Here a version is implemented where the new grammar’s size is bounded by O(|G|^2) instead of O(2^2^|G|), adhering mainly to the presentation in a textbook by Elaine Rich.
References
Websites
Papers
Chomsky - On Certain Formal Properties of Grammars (1959)
An implication on pp. 149-150: “Theorem 5 asserts in particular that all type 2 languages can be generated by grammars which yield only trees with no more than two branches from each node.”
Lange, Leiß - To CNF or not to CNF? An efficient yet presentable version of the CYK algorithm (2009)
pp. 5-6: “the order in which the single transformation steps are carried out should be: BIN→DEL→UNIT. This will yield a grammar of size O(|G|^2). Nevertheless, many textbooks choose a different order, namely DEL→UNIT→BIN which yields agrammar of size O(2^2^|G|).”
Table 3 on p. 7 gives an overview of textbooks that show how to transform a grammar into CNF. It indicates whether the transformation results in a grammar with size bound by O(|G|^2) or O(2^2^|G|). Based on this table, the textbook by Elaine Rich was chosen here as reference. The textbook by Hopcroft and Ullman is also mentioned because it is very well known and provides a detailed discussion of CNF, yet not a transformation with O(|G|^2) guarantee.
Books
Rich - Automata, Computability and Complexity (2007): pp. 171-175
p. 171: “There exists a straightforward four-step algorithm that converts a grammar […] into a new grammar […] in Chomsky normal form and L(G_C) = L(G) - {ε}”
removeEps: removing from G all ε-rules
removeUnits: removing from G all unit productions
removeMixed: removing from G all rules whose right-hand sides have length greater than 1 and include a terminal
removeLong: removing from G all rules whose right-hand sides have length greater than 2
p. 175: “We will run removeLong as step 1 rather than as step 4. […] With this change, removeEps runs in linear time. […] So, if we change converttoChomsky so that it does step 4 first, its time complexity is O(n^2) and the size of the grammar that it produces is also O(n^2).”
Hopcroft, Ullman - Automata theory, language and computation (2004)
Preliminary simplications: pp. 261-272
Eliminate ε-productions
Eliminate unit productions
Eliminate useless symbols
Transformation to Chomsky Normal Form pp. 272-275
Arrange that all bodies of length 2 or more consist only of variables
Break bodies of length 3 or more into a cascade of production