zerowidth positive lookahead

Introducing TireSwing

Several weeks ago, I was inspired by Corey Donohoe’s dot_xen project to look into the Treetop parsing library. After a few hours, I had the initial groundwork laid for a simpler way of defining Treetop nodes, building them from the grammar itself, and defining visitors for the resulting AST. After a few knots and some climbing, I’d hung a TireSwing from Treetop.

In order to demonstrate what TireSwing can do, I’ll build up an example, starting the hard way and then simplifying. Skip to the end to see the final result

Simple Assignments

I’ll start with an simple input file:

name = nathan
awesome = true
treetop = neato

A basic grammar to parse it, defined in Treetop:

grammar Assignments

  rule assignments
    assignment rest:("\n" assignment)* "\n"?
  end

  rule assignment
    lhs:literal whitespace "=" whitespace rhs:literal
  end

  rule literal
    [a-z]+
  end

  rule whitespace
    [ \t]**
  end

end

Treetop provides a parser, AssignmentsParser. AssignmentsParser.new.parse(File.read("assignments.txt")) provides a tree of Treetop::Runtime::SyntaxNodes, which we can then traverse to build up a simpler external AST.

I’m explicitly building a separate AST rather than converting the output directly to a hash or the like, since I’m assuming that I’ll want to manipulate it in an abstract fashion – an example from a project I worked on recently is parsing and translating boolean queries from one dialect to another – so I’ll skip that for now.

Building an AST

To build an AST, I can define methods inline in the grammar to create the nodes.

grammar Assignments

  rule assignments
    assignment rest:("\n" assignment)* "\n"? {
      def build
        assignments = [assignment.build]
        rest.elements.each do |child|
          if child.respond_to?(:assignment)
            assignments << child.assignment.build
          end
        end
        AST::Assignments.new(assignments)
      end
    }
  end

  rule assignment
    lhs:literal whitespace "=" whitespace rhs:literal {
      def build
        AST::Assignment.new(lhs.text_value, rhs.text_value)
      end
    }
  end

  rule literal
    [a-z]+
  end

  rule whitespace
    [ \t]*
  end

end

And the accompanying ruby file, which defines an AST:

require "pp"
require "rubygems"
require "treetop"

Treetop.load("assignments.treetop")

module AST
  Assignments = Struct.new(:assignments)
  Assignment = Struct.new(:lhs, :rhs)
end

pp AssignmentsParser.new.parse(File.read("assignments.txt")).build

Let’s see what happens:

$ ruby assignments.rb
#<struct AST::Assignments
 assignments=
  [#<struct AST::Assignment lhs="name", rhs="nathan">,
   #<struct AST::Assignment lhs="awesome", rhs="true">,
   #<struct AST::Assignment lhs="treetop", rhs="neato">]>

Cool. That wasn’t too hard, actually. However, my introduction to Treehouse via dot_xen made it clear that when you start getting lots of different node types and lots of inline code in the grammar, it makes the grammar and parsing code both complex and repetitive. Let’s simplify.

TireSwing

Treetop provides a <ClassName> syntax for grammars which lets you create instances of subclasses (or duck-types) of SyntaxNode instead of the default bare SyntaxNode. You could use that mechanism to define an AST of subclasses, but I wanted “clean” AST nodes, ones with only the data in them rather than the code that comes along with SyntaxNode.

Here’s the grammar. Turns out the Treetop meta-grammar allows for anything between those <>’s, as long as it’s not a >, so I’ll use that to my advantage. A helper from TireSwing: <node(...)>.

grammar Assignments

  rule assignments
    assignment rest:("\n" assignment)* "\n"? <node(:assignments)>
  end

  rule assignment
    lhs:literal whitespace "=" whitespace rhs:literal <node(:assignment)>
  end

  rule literal
    [a-z]+
  end

  rule whitespace
    [ \t]*
  end

end

And the accompanying ruby file:

require "rubygems"
gem "tire_swing" # whoops, named this file the same as the gem!
require "tire_swing"

module AST
  include TireSwing::NodeDefinition
  node :assignments, :assignments => array_of(:assignment)
  node :assignment, :lhs, :rhs
end

Treetop.load("tire_swing.treetop")
TireSwing.parses_grammar(Assignments, AST)

require "pp"
pp AssignmentsParser.ast(File.read("assignments.txt"))

And the output:

#<AST::Assignments:0x1195108
 @assignments=
  [#<AST::Assignment:0x1194514 @lhs="name", @rhs="nathan">,
   #<AST::Assignment:0x1194398 @lhs="awesome", @rhs="true">,
   #<AST::Assignment:0x1194230 @lhs="treetop", @rhs="neato">]>

The end result isn’t all that different, but there’s some simplifying magic behind it. Not only did TireSwing tease out the recursive and nested nodes in the assignments rule, it converted the lhs and rhs to the string literals I was looking for, and I end up with a nice clean AST.

You can look in the TireSwing example code for more complex examples.

You can download the the example files here and get TireSwing at GitHub.