Yapps (Yet Another Python
Parser System) is an easy to use parser
generator that is written in Python and generates Python code. There
are several parser generator systems already available for Python,
including PyLR, kjParsing, PyBison, and mcf.pars,
but I had different goals for my parser. Yapps is simple, is easy to
use, and produces human-readable parsers. It is not the fastest or
most powerful parser. Yapps is designed to be used when regular
expressions are not enough and other parser systems are too much:
situations where you may write your own recursive descent parser.
Some unusual features of Yapps that may be of interest are:
Yapps produces recursive descent parsers that are readable by
humans, as opposed to table-driven parsers that are difficult to
read. A Yapps parser for a simple calculator looks similar to the
one that Mark Lutz wrote by hand for Programming Python.
Yapps also allows for rules that accept parameters and pass
arguments to be used while parsing subexpressions. Grammars that
allow for arguments to be passed to subrules and for values to be
passed back are often called attribute grammars. In many
cases parameterized rules can be used to perform actions at “parse
time” that are usually delayed until later. For example,
information about variable declarations can be passed into the
rules that parse a procedure body, so that undefined variables can
be detected at parse time. The types of defined variables can be
used in parsing as well—for example, if the type of X is
known, we can determine whether X(1) is an array reference or
a function call.
Yapps grammars are fairly easy to write, although there are
some inconveniences having to do with ELL(1) parsing that have to be
worked around. For example, rules have to be left factored and
rules may not be left recursive. However, neither limitation seems
to be a problem in practice.
Yapps grammars look similar to the notation used in the Python
reference manual, with operators like *, +, |,
[], and () for patterns, names (tim) for rules,
regular expressions ("[a-z]+") for tokens, and # for
comments.
The Yapps parser generator is written as a single Python module
with no C extensions. Yapps produces parsers that are written
entirely in Python, and require only the Yapps run-time module (5k)
for support.
Yapps’s scanner is context-sensitive, picking tokens based on
the types of the tokens accepted by the parser. This can be
helpful when implementing certain kinds of parsers, such as for a
preprocessor.
There are several disadvantages of using Yapps over another parser system:
Yapps parsers are ELL(1) (Extended LL(1)), which is
less powerful than LALR (used by PyLR) or
SLR (used by kjParsing), so Yapps would not be a
good choice for parsing complex languages. For example, allowing
both x := 5; and x; as statements is difficult
because we must distinguish based on only one token of lookahead.
Seeing only x, we cannot decide whether we have an
assignment statement or an expression statement. (Note however
that this kind of grammar can be matched with backtracking; see
section .)
The scanner that Yapps provides can only read from strings, not
files, so an entire file has to be read in before scanning can
begin. It is possible to build a custom scanner, though, so in
cases where stream input is needed (from the console, a network, or
a large file are examples), the Yapps parser can be given a custom
scanner that reads from a stream instead of a string.
Yapps is not designed with efficiency in mind.
Yapps provides an easy to use parser generator that produces parsers
similar to what you might write by hand. It is not meant to be a
solution for all parsing problems, but instead an aid for those times
you would write a parser by hand rather than using one of the more
powerful parsing packages available.
Yapps 2.0 is easier to use than Yapps 1.0. New features include a
less restrictive input syntax, which allows mixing of sequences,
choices, terminals, and nonterminals; optional matching; the ability
to insert single-line statements into the generated parser; and
looping constructs * and + similar to the repetitive
matching constructs in regular expressions. Unfortunately, the
addition of these constructs has made Yapps 2.0 incompatible with
Yapps 1.0, so grammars will have to be rewritten. See section
for tips on changing Yapps 1.0 grammars for use
with Yapps 2.0.