1 Introduction

Yapps (Yet Another Python Parser System) is an easy to use parser generator that is written in Python and generates Python code. There are several parser generator systems already available for Python, including PyLR, kjParsing, PyBison, and mcf.pars, but I had different goals for my parser. Yapps is simple, is easy to use, and produces human-readable parsers. It is not the fastest or most powerful parser. Yapps is designed to be used when regular expressions are not enough and other parser systems are too much: situations where you may write your own recursive descent parser.

Some unusual features of Yapps that may be of interest are:

  1. Yapps produces recursive descent parsers that are readable by humans, as opposed to table-driven parsers that are difficult to read. A Yapps parser for a simple calculator looks similar to the one that Mark Lutz wrote by hand for Programming Python.

  2. Yapps also allows for rules that accept parameters and pass arguments to be used while parsing subexpressions. Grammars that allow for arguments to be passed to subrules and for values to be passed back are often called attribute grammars. In many cases parameterized rules can be used to perform actions at “parse time” that are usually delayed until later. For example, information about variable declarations can be passed into the rules that parse a procedure body, so that undefined variables can be detected at parse time. The types of defined variables can be used in parsing as well—for example, if the type of X is known, we can determine whether X(1) is an array reference or a function call.

  3. Yapps grammars are fairly easy to write, although there are some inconveniences having to do with ELL(1) parsing that have to be worked around. For example, rules have to be left factored and rules may not be left recursive. However, neither limitation seems to be a problem in practice.

    Yapps grammars look similar to the notation used in the Python reference manual, with operators like *, +, |, [], and () for patterns, names (tim) for rules, regular expressions ("[a-z]+") for tokens, and # for comments.

  4. The Yapps parser generator is written as a single Python module with no C extensions. Yapps produces parsers that are written entirely in Python, and require only the Yapps run-time module (5k) for support.

  5. Yapps’s scanner is context-sensitive, picking tokens based on the types of the tokens accepted by the parser. This can be helpful when implementing certain kinds of parsers, such as for a preprocessor.

There are several disadvantages of using Yapps over another parser system:

  1. Yapps parsers are ELL(1) (Extended LL(1)), which is less powerful than LALR (used by PyLR) or SLR (used by kjParsing), so Yapps would not be a good choice for parsing complex languages. For example, allowing both x := 5; and x; as statements is difficult because we must distinguish based on only one token of lookahead. Seeing only x, we cannot decide whether we have an assignment statement or an expression statement. (Note however that this kind of grammar can be matched with backtracking; see section [*].)

  2. The scanner that Yapps provides can only read from strings, not files, so an entire file has to be read in before scanning can begin. It is possible to build a custom scanner, though, so in cases where stream input is needed (from the console, a network, or a large file are examples), the Yapps parser can be given a custom scanner that reads from a stream instead of a string.

  3. Yapps is not designed with efficiency in mind.

Yapps provides an easy to use parser generator that produces parsers similar to what you might write by hand. It is not meant to be a solution for all parsing problems, but instead an aid for those times you would write a parser by hand rather than using one of the more powerful parsing packages available.

Yapps 2.0 is easier to use than Yapps 1.0. New features include a less restrictive input syntax, which allows mixing of sequences, choices, terminals, and nonterminals; optional matching; the ability to insert single-line statements into the generated parser; and looping constructs * and + similar to the repetitive matching constructs in regular expressions. Unfortunately, the addition of these constructs has made Yapps 2.0 incompatible with Yapps 1.0, so grammars will have to be rewritten. See section [*] for tips on changing Yapps 1.0 grammars for use with Yapps 2.0.

Amit J Patel, amitp@cs.stanford.edu