The parser class contains one method for each rule in the grammar. These methods return a pair, (value,index) where the value is a user-defined value and the index points to the next token that has not been used. The indices are the same as those passed to the token method in the scanner.
Yapps produces as output two exception classes, a scanner class, a parser class, and a function parse that puts everything together. The parse function does not have to be used; instead, one can create a parser and scanner object and use them together for parsing.
def parse(rule, input): parser = X(XScanner(input)) try: return getattr(parser, rule)()[0] except SyntaxError, s: try: import yapps yapps.print_error(input, s, parser.scanner) except ImportError: print "Syntax Error",s.msg,"on line",1+count(input[:s.pos], "\n") except NoMoreTokens: print "Ran out of input"
The parse function takes a name of a rule and an input string
as input. It creates a scanner and parser object, then tries to
execute the method in the parser object named rule. (The
parser object contains one method for each of the rules in the
grammar.) The parser method returns a pair: the value specified in
the <<...>>
part of the grammar, and the number of tokens that
were read from the string. (The latter can be used if we wanted to
parse something else from the string.) The parse function
throws away the token count and just returns the first element of the
tuple.
There are several situations in which the parse function would not be useful. If a different parser or scanner is being used, the token count is needed, or a exceptions are to be handled differently, a new parse function would be required. The supplied parse function can be used as a template for writing a function for your own needs. An example of a custom parse function is the generate function in Yapps.py.
Unlike most scanners, the scanner produced by Yapps can take into account the context in which tokens are needed, and try to match only good tokens. For example, in the grammar:
parser IniFile: token ID: "[a-zA-Z_0-9]+" token VAL: ".*" rule pair: ID "[ \t]*=[ \t]*" VAL "\n"
we would like to scan lines of text and pick out a name/value pair. In a conventional scanner, the input string shell=progman.exe would be turned into a single token of type VAL. The Yapps scanner, however, knows that at the beginning of the line, an ID is expected, so it will return "shell" as a token of type ID. Later, it will return "progman.exe" as a token of type VAL.
Context sensitivity decreases the separation between scanner and parser, but it is useful in parsers like IniFile, where the tokens themselves are not unambiguous, but are unambiguous given a particular stage in the parsing process.
Unfortunately, context sensitivity can make it more difficult to detect errors in the input. For example, in parsing a Pascal-like language with “begin” and “end” as keywords, a context sensitive scanner would only match “end” as the END token if the parser is in a place that will accept the END token. If not, then the scanner would match “end” as an identifier. To disable the context sensitive scanner in Yapps, use the context-insensitive-scanner option, either from the command line or in the parser:
Parser X: option: "context-insensitive-scanner"
There are two internal variables that may be of use. The current rule
begins scanning the input string at character position _start_
.
The index of the current token being matched is _pos_
. The
token can be retrieved by accessing the scanner object,
self.scanner. For example, to retrieve the four-element
tuple representing the first token of the current rule, use
self.scanner.token(_pos_). A potentially useful combination
of these two variables is to extract the portion of the input matched
by the current rule. The second element of a token tuple is the end
character position, so we want to extract the string from
_start_
to self.scanner.token(_pos_)[1]:
rule R: a b c -> << self.scanner.input[_start_:self.scanner.token(_pos_)[1]] >>
Sometimes the parser code needs to rely on helper variables, functions, and classes. A Yapps grammar can optionally be surrounded by double percent signs, to separate the grammar from Python code.
... Python code ... %% ... Yapps grammar ... %% ... Python code ...
The second %%
can be omitted if there is no Python code at the
end, and the first %%
can be omitted if there is no extra
Python code at all. (To have code only at the end, both separators
are required.)
If you want to use “new style” Python 1.5 regular expressions add an option to your grammar before you define tokens:
parser X: option: "use-new-regexps" token: ...
Yapps will then generate code to use the new regular expression module (re) rather than the old one (regex).
Yapps does several things that simplify the generated parser code, to
make it faster, shorter, and more readable. Known changes to the
internal variable _pos_
are propagated to future statements
instead of performing assignments to modify the variable. Also, some
variables that would be assigned to, but never used, are not assigned.
(The analysis of this situation is not as comprehensive as it could
be.)
Yapps looks for a special case that commonly occurs when matching lists of patterns. Lists are commonly expressed with one of these two constructs:
rule Xlist: X Xlist -> << [X] + Xlist >> | -> << [] >> rule X: ... rule Ylist<<r>>: Y Ylist<<r+[Y]>> -> << Ylist >> | -> << r >> rule Y: ...
Both constructs produce the resulting list using recursion instead of looping (since looping is not available in Yapps). The second version, however, is called tail-recursive. To be a tail call, the return value in the recursive clause has to be simply the name of the last subrule matched; no computation can be performed on it before it is returned. The first clause of Ylist is a tail call because it returns Ylist, and the last subrule matched was Ylist<<...>>. To be tail recursive, it must be a tail call, and it must also be the same as the rule being defined. Since we are defining Ylist and returning Ylist, the clause is tail recursive.
Yapps will detect tail calls and make a minor optimization for them. The real win is in tail recursion: Yapps will turn the recursive function into a while loop.