Parsing with Yapps

1997-1999

License: MIT

Yapps (Yet Another Python Parser System) is an easy to use parser generator that is written in Python and generates Python code. Although there are several parser generators already available for Python, I had different goals, including learning about recursive descent parsers^[1], and exploring new features, as my gut feeling back in the 1990s that parsing was not a solved problem, as widely believed^[2]. Yapps is simple, is easy to use, and produces human-readable parsers. It is not the fastest, most powerful, or most flexible parser. Yapps is designed to be used when regular expressions are not enough and other parser systems are too much: situations where you may write your own recursive descent parser. On this page you can find both Yapps 1 and Yapps 2. Yapps 1 is more like a functional language (concise grammars of the form when you see this, return this), while Yapps 2 is more like an imperative language (more verbose grammars of the form if/while you see this, do this). Both are completely free (MIT license).

For a quick demonstration of how easy it is to write a Yapps grammar, take a look at the 15 line expression evaluation example, which produces code to parse and evaluate expressions like 13-(1+5)*8/7. You may also be interested in Michael Breen’s annotated calculator^[3].

Features

Some unusual features of Yapps that may be of interest are:

Yapps produces human-readable recursive descent parsers. There are several heuristics used to keep the generated code simple.
Yapps produces context-sensitive scanners that pick tokens based on the type of tokens accepted by the parser. In some situations, token matching is ambiguous unless the context is taken into account.
Yapps rules can pass arguments down to subrules, so subrules can use information (such as declarations) that was parsed at higher levels in the parsing process. These are sometimes called attribute grammars.

There are several disadvantages of using Yapps over another parser system:

Yapps parsers are LL(1), which is less powerful in parsing power than LALR or SLR. There are some inconveniences having to do with LL(1) (such as the lack of operator precedence) that lead to more explicit grammars.
The Yapps scanner can only read from strings, not from files, so it may not be useful if your input is large. However, it is possible to write a custom scanner for your application. (Note: the enhanced version of Yapps in Debian can read from files.)
Yapps is not designed with efficiency in mind. It’s not the fastest parser available (nor is it the slowest).

The Python.org site has a paper comparing Yapps to Spark and BisonGen^[4]. The Python Wiki has a list of other Python parsers^[5] you may want to try if Yapps isn’t right for your needs.

Download#

I have not worked on Yapps since 2003. Yapps is open source, and several other people have contributed or continue to work on it.

Recommended: Yapps is part of the Debian repository, available as package yapps2^[6] and yapps-runtime^[7]. The Debian package has improvements from Matthias Urlichs^[8] that are not backported into my version. These improvements include reading from files, reading input incrementally (instead of requiring the entire input to be in memory at once), ability to parse multi-line comments, including one input from another, better error messages, and support for Python 3.

Jiří Pinkava has made some patches available if you want to use the Debian package in Python 3. Download them here.

Recomended: Matthias Ulrichs has a Yapps repository on Github^[9] with Python 3 compatibility (see the deb branch).

Documentation

Other versions

The last stable version I released, Yapps 2.0.4 (20 Jul 2003), is available in ZIP format. It includes the parser generator, a runtime library, the grammar used to build Yapps itself, some very simple examples, and documentation.

The last development version, Yapps 2.1.1 (27 Aug 2003) is also available in ZIP format, or on Github^[10]. See the ChangeLog for a list of changes. This version was a step along the way towards Yapps 3, but I abandoned that project.

Users

I used Yapps mostly for small languages. I have not released any Yapps grammars for large languages like HTML or Java. You might find some more extensive grammars here:

PyXPath 1.0^[11] - XPath (read more here^[12]) and PyXPath 1.1^[13]
Py-DOM-XPath^[14]
PyCifRW^[15] - CIF
selectorp.g^[16] - CSS selectors
rdfn3.g^[17] - RDF N3 (“notation 3”), grammar by Dan Connolly^[18]
IDL^[19] part of FnOrb, a CORBA ORB written in Python
Otterlang^[20] - first order logic
RQL^[21], a relationship query language, and fyzz^[22], a Sparql parser
CS410^[23] - grammars for the Programming Languages course, Western Washington University
Kayali^[24] - computer algebra system
COMP 304B^[25] - class at McGill
XBGP-MAN: an XML management architecture for BGP^[26] (subscription required)
Indent PHP parser^[27]
Query parser^[28] in the Chandler^[29] project
pyScss^[30] uses Yapps for parsing and a custom scanner written in C for performance
(others)^[31]
Noweb literate programming^[32]

One of the more interesting parsers I’ve written in Yapps is for Yapps itself. I started out writing a hand-written parser, and then I looked at how I implemented it. I made Yapps output something very similar to what I had written by hand. I then rewrote the Yapps parser in Yapps, generated its output, and compared it to my handwritten parser. Once I was happy with the output, I threw out my original parser and used the Yapps-generated one.

Yapps 2 vs. Yapps 1#

Yapps 2 is more flexible than Yapps 1 and is not backwards-compatible with Yapps 1. The main changes are:

It allows statements and not only expressions to be embedded into the parser.
This is wonderful for setting up loops. Here’s an example for parsing Lisp lists in Yapps 1:
```
rule list:   "(" seq ")"    ->  << seq >>
rule seq:                   ->  << [] >>
           | expr seq       ->  << [expr] + seq >>
```
And here’s what is used in Yapps 2:
```
rule list:
  r"\("  {{ e = [] }}        # initialize the list
  (                          # begin a loop
   expr {{ e.append(expr) }} # add each expr
  ) *                        # repeat as necessary
  r"\)"  {{ return e }}      # return the list
```
Note that where Yapps 1 required you to set up a new recursive subrule, Yapps 2 allows you to express loops in a natural iterative style. In addition, the Yapps 2 style is more efficient (appending to an existing list). The Yapps 1 grammar is shorter, but harder to write, and less efficient.
It has convenience constructs like optional matching as well as the + and * operators from regular expressions.
In the example above you can see that the ( expr ... ) * construct is used to repeat zero or more expressions. You can also mix choices and sequences together. For example, you can write A (B | C) D. These changes eliminate the need to use extra rules.
It uses Python 1.5-style (Perl-compatible) regular expressions.
These regular expressions are now standard in the Python community, and are more powerful than the old-style regular expressions. In addition, you can use the Python r"string" syntax to avoid endless strings of backslashes.
You can choose to use the Yapps 2 run time library to make the generated parser smaller, or to make a standalone parser file.

Yapps 1: How it started#

My desire for a parser generator began in late 1997 with a school project for which I used my own hand-written parser. At the same time, there was a thread on comp.lang.python about parsing e-mail addresses with nested <>’s. I was reading Stroustrup’s “The C++ Programming Language” and came across an expression parser. I was also reading Mark Lutz’s “Programming Python” and came across another expression parser. The combination of these four events inspired me to write a parser generator that could produce parsers similar to Lutz’s hand-written parser. Many of the features of Yapps are inspired by the ANTLR/PCCTS system written by Terrence Parr. ANTLR combines flexibility, speed, and parsing power into a system that produces readable recursive descent LL(k) parsers for C, C++, or Java.

Download

Yapps 1 is a single Python file, approximately 30k in length. It is likely that you will also want to see the documentation, available in these formats:

There are three tiny examples available:

Name	Grammar	Parser
Calculator	expr.g	expr.py
Lisp Expressions	lisp.g	lisp.py
Yapps Grammar	parsedesc.g	parsedesc.py

Future#

Yapps 2 was quite good for my needs. The weakest point is error handling. Yapps will detect errors in the input and complain, and it attempts to display the portion of the input that was bad, but its explanatory abilities are limited. There also wasn’t any attempt at error recovery.

I often needed parsers in C++. It would be neat if Yapps could produce a C++ or Ruby parser instead of a Python parser. However, since Yapps mixes the grammar with Python code, a single grammar couldn’t be used to build both a parser module in Python and another in C++.

I wrote Yapps back in the Python 1.5 days. I wanted to write a version that worked better with Python 2 or 3. I’ve put more thoughts on my blog^[33]. If I were to write a new parser project today, it’d have lots of automated testing. However, I have no plans to work on Yapps in the forseeable future. Since the 1990s there’s been quite a lot of development on parsers, with parser combinators, PEGs, and other developments, and I no longer feel the need to write my own.