zerowidth positive lookahead

Annotating a TOML Parse Tree

This is part 2 of 4.

In the previous article, we built a TOML parser. Now that we can parse a document, we need to convert the parse tree into something we can use.

Named Nodes

As defined, the parser simply returns the string that it parsed. This isn’t helpful, so we need to somehow annotate matches during parsing to build a structured representation of a TOML document. Once we’ve creating this tree representation, we’ll be able to transform it into the final data structure that we’re looking for.

We’ll do this using Parslet’s .as method to name matched parts of rules. When parslet matches something, it converts it to a hash of {:name => "matched value"}. These can be nested along with the rules, resulting in a tree of hashes and arrays.


For values, we capture the type of the value along with its contents.

rule(:integer) do
  (str("-").maybe >> match["1-9"] >> digit.repeat).as(:integer)

rule(:float) do
  (str("-").maybe >> digit.repeat(1) >>
   str(".") >> digit.repeat(1)).as(:float)

rule(:boolean) do
  (str("true") | str("false")).as(:boolean)

rule(:datetime) do
  (digit.repeat(4) >> str("-") >>
   digit.repeat(2) >> str("-") >>
   digit.repeat(2) >> str("T") >>
   digit.repeat(2) >> str(":") >>
   digit.repeat(2) >> str(":") >>
   digit.repeat(2) >> str("Z")).as(:datetime)

rule(:string) do
  str('"') >>
  ((escaped_special | string_special.absent? >> any).repeat).as(:string) >>

And the tests:

it "parses integers into {:integer => 'digits'}" do
  expect(value_parser.parse("1234")).to eq :integer => "1234"

it "parses floats into {:float => 'digits'}" do
  expect(value_parser.parse("-0.123")).to eq :float => "-0.123"

it "parses booleans into {:boolean => 'value'}" do
  expect(value_parser.parse("true")).to eq :boolean => "true"

it "parses datetimes into hashes of date/time data" do
  expect(value_parser.parse("1979-05-27T07:32:00Z")).to eq(
    :datetime => "1979-05-27T07:32:00Z"

it "parses strings into {:string => 'string contents'}" do
  expect(value_parser.parse('"hello world"')).to eq(
    :string => "hello world")


Parslet handles repeated elements “magically”. If there is a sequence of matched values, it will automatically combine them into an array of elements. From the docs, capturing basic repeats can work either way we need it to:

str('a') # "aaa" => {:b=>"aaa"@0}
str('a').as(:b).repeat # "aaa" => [{:b=>"a"@0}, {:b=>"a"@1}, {:b=>"a"@2}]

For arrays, we’ll want to capture the outer array as :array => ... and the contents as an array of values, [ {:integer => "1"}, {:integer => "2"}, ...].

If we weren’t parsing nested arrays, we could leave off the .as(:array) and Parslet would automatically give us bare arrays of values. However, it’s a little too smart about merging the results of parsed sub-trees and it flattens nested arrays, so we’ll be explicit.

rule :array do
  str("[") >> array_space >>
  array_contents.repeat(1).as(:array) >>
  array_space >> str("]")

it "captures arrays as :array => [ value, value, ... ]" do
  expect(array_parser.parse("[1,2]")).to eq(
    :array => [ {:integer => "1"}, {:integer => "2"}])

it "captures nested arrays" do
  expect(array_parser.parse("[ [1,2] ]")).to eq(
    :array => [
      {:array => [ {:integer => "1"}, {:integer => "2"}]}


We’d like individual assignments to look like {:key => "key", :value => value}.

The initial version of the parser was a little loose about when and where it would match whitespace, so we’ll refactor the parser rules a bit too.

First, a couple of the helper rules, changing whitespace to space? and comment to comment?:

rule(:space?) { space.repeat }

rule(:comment?) do
  (str("#") >> (newline.absent? >> any).repeat).maybe

Next, we’ll look at series of assignments. Originally the assignment rule handled whitespace within itself, but we’ll move that elsewhere. We capture the key and the value:

rule :assignment do >>
  space? >> str("=") >> space? >>

it "captures the key and the value" do
  expect(ap.parse("thing = 1")).to eq(
    :key => "thing", :value => {:integer => "1"})

A sequence of assignments can have several forms: nothing (no assignments), a single assignment, a series of comments and whitespace, a series of “bare” assignments, or a series of assignments with comments and whitespace interspersed. To handle this well, we’ll start with a single line:

rule :assignment_line do
  space? >> assignment.maybe >> space? >> comment?

Now we can easily combine these and capture the overall results as :assignments => ...:

rule :assignments do
  (assignment_line >> (newline >> assignment_line).repeat).as(:assignments)

And test these with a variety of inputs:

let(:ap) { parser.assignments }

it "captures a list of assignments" do
  expect(ap.parse("a=1\nb=2")).to eq(
    :assignments => [
      {:key => "a", :value => {:integer => "1"}},
      {:key => "b", :value => {:integer => "2"}},

it "captures an empty string" do
  expect(ap.parse("")).to eq(:assignments => "")

it "captures just comments as a string" do
  expect(ap.parse("#comment\n")).to eq(
    :assignments => "#comment\n"

A list of assignments containing just a comment is matched as a string. This is because we’ve defined a capture, and even if it doesn’t match any structured {:key => ..., :value => ... pairs, it matches and captures the string itself. This is ok, we’ll just have to handle this case when we transform the tree later on.

But now it looks like we have a problem. If we try and parse the following string using assignments:

a = 1

It’s parsed as :assignments => [{:key => "#comment\na", :value => {:integer => "1"}}] The key has somehow managed to capture the preceding comment and newline.

If we look at the rule for an assignment again, it starts with The way this is invoked from the assignment_line rule is: space? >> assignment.maybe. When presented with "#comment\nkey=value", the parser sees the '#' and interprets it as a key. Because a key is just “not whitespace”, the remainder of the comment and the newline are accepted.

To fix this, we need to restrict the definition of a key to make sure that it doesn’t begin with either a # or a newline:

rule :key do
  str("#").absent? >> newline.absent? >>
  (match["\\[\\]="].absent? >> space.absent? >> any).repeat(1)

And that solves it:

it "captures an assignment after a comment and newlines" do
  expect(ap.parse("#comment\na=1")).to eq(
    :assignments => [{:key => "a", :value => {:integer => "1"}}]
  expect(ap.parse("#comment\n\t\n\na=1")).to eq(
    :assignments => [{:key => "a", :value => {:integer => "1"}}]

Key Groups

Finally, key groups. We’ll capture key group names as :group_name => "name":

rule :group_name do
  space? >> str("[") >>
  (str("]").absent? >> any).repeat(1).as(:group_name) >>
  str("]") >> space? >> comment?

A key group must have a group name, but after that it can be empty or have a series of assignments. We’ve already written that rule, so:

rule :key_group do
  (group_name >>
   (newline >> assignments).maybe).as(:key_group)

let(:kgp) { parser.key_group }

it "captures the group name and assignments" do
  expect(kgp.parse("[kg]\na=1\nb=2")).to eq(
    :key_group =>
      {:group_name => "kg",
      :assignments => [
        {:key => "a", :value => {:integer => "1"}},
        {:key => "b", :value => {:integer => "2"}}]}

it "captures empty assignments as a string" do
  expect(kgp.parse("[kg]\n#comment\n\t\n")).to eq(
    :key_group =>
      {:group_name => "kg",
       :assignments => "#comment\n\t\n"}

it "captures a single assignment in a key group" do
  expect(kgp.parse("[kg]\na=1")).to eq(
    :key_group => {
      :group_name => "kg",
      :assignments => {:key => "a", :value => {:integer => "1"}}}

Note the final test, where a single assignment is captured. It’s captured as a single hash rather than an array with one item, because only one matched during parsing. Parslet will only create an array of matched items if there are more than one.


A document, as before, is an optional series of assignments, possibly followed by one or more key groups. We’ll capture the the whole thing as :document.

rule :document do
  ((key_group | assignments) >>
   key_group.repeat >>

And now, this TOML document:

title = "global title"
a = 1
b = 2
c = [ 3, 4 ]

is captured as:

  [{:assignments=>{:key=>"title", :value=>{:string=>"global title"}}},
       [{:key=>"a", :value=>{:integer=>"1"}},
        {:key=>"b", :value=>{:integer=>"2"}}]}},
      :assignments=>{:key=>"c", :value=>{:array=>[{:integer=>"3"}, {:integer=>"4"}]

In part 3 of this series, we’ll transform this tree of captured values into a usable hash using Parslet’s tranformation engine.

Next: Transforming a TOML Parse Tree