Several weeks ago, I was inspired by Corey Donohoe’s dot_xen project to look into the Treetop parsing library. After a few hours, I had the initial groundwork laid for a simpler way of defining Treetop nodes, building them from the grammar itself, and defining visitors for the resulting [AST](Abstract Syntax Tree). After a few knots and some climbing, I’d hung a TireSwing from Treetop.
In order to demonstrate what TireSwing can do, I’ll build up an example, starting the hard way and then simplifying. Skip to the end to see the final result
Simple Assignments #
I’ll start with an simple input file:
name = nathan awesome = true treetop = neato
A basic grammar to parse it, defined in Treetop:
Treetop provides a parser,
AssignmentsParser.new.parse(File.read("assignments.txt")) provides a tree of
Treetop::Runtime::SyntaxNodes, which we can then traverse to build up a simpler external AST.
I’m explicitly building a separate AST rather than converting the output directly to a hash or the like, since I’m assuming that I’ll want to manipulate it in an abstract fashion – an example from a project I worked on recently is parsing and translating boolean queries from one dialect to another – so I’ll skip that for now.
Building an AST #
To build an AST, I can define methods inline in the grammar to create the nodes.
And the accompanying ruby file, which defines an AST:
Let’s see what happens:
$ ruby assignments.rb #<struct AST::Assignments assignments= [#<struct AST::Assignment lhs="name", rhs="nathan">, #<struct AST::Assignment lhs="awesome", rhs="true">, #<struct AST::Assignment lhs="treetop", rhs="neato">]>
Cool. That wasn’t too hard, actually. However, my introduction to Treehouse via dot_xen made it clear that when you start getting lots of different node types and lots of inline code in the grammar, it makes the grammar and parsing code both complex and repetitive. Let’s simplify.
Treetop provides a
<ClassName> syntax for grammars which lets you create instances of subclasses (or duck-types) of
SyntaxNode instead of the default bare
SyntaxNode. You could use that mechanism to define an AST of subclasses, but I wanted “clean” AST nodes, ones with only the data in them rather than the code that comes along with
Here’s the grammar. Turns out the Treetop meta-grammar allows for anything between those
<>‘s, as long as it’s not a
>, so I’ll use that to my advantage. A helper from TireSwing:
And the accompanying ruby file:
And the output:
#<AST::Assignments:0x1195108 @assignments= [#<AST::Assignment:0x1194514 @lhs="name", @rhs="nathan">, #<AST::Assignment:0x1194398 @lhs="awesome", @rhs="true">, #<AST::Assignment:0x1194230 @lhs="treetop", @rhs="neato">]>
The end result isn’t all that different, but there’s some simplifying magic behind it. Not only did TireSwing tease out the recursive and nested nodes in the assignments rule, it converted the lhs and rhs to the string literals I was looking for, and I end up with a nice clean AST.
You can look in the TireSwing example code for more complex examples.