zerowidth positive lookahead

Transforming a TOML Parse Tree

This is part 3 of 4.

Transforms

Now that our parser is giving us an annotated parse tree, we need to transform it into something usable.

Parslet’s Transform class allows us to define rules to match parts of a captured tree and change them into whatever structure we need. That could be an abstract syntax tree, or, in the case of a TOML document, it should result in a hash.

The transformer matches and applies each rule in the order it was defined, starting at the leaves and then working its way up.

A transform rule takes a pattern–an exact hash and value to match–and a block of code which executes on the matched pattern. For values, we’ll use the simple matcher, which only matches strings, numbers, and the like. simple will not match arrays or hashes.

We’ll start by looking at the overall structure of a simple parsed TOML document.

{:document=>
  [{:assignments=>{:key=>"title", :value=>{:string=>"global title"}}},
   {:key_group=>
     {:group_name=>"group1",
      :assignments=>
       [{:key=>"a", :value=>{:integer=>"1"}},
        {:key=>"b", :value=>{:integer=>"2"}}]}},
   {:key_group=>
     {:group_name=>"group2",
      :assignments=>{:key=>"c", :value=>{
        :array=>[{:integer=>"3"}, {:integer=>"4"}]

The leaves in this tree are values. We’ll start by defining transformation rules for them:

module TOML
  class Transform < Parslet::Transform
    rule(:integer  => simple(:n))  { Integer(n) }
    rule(:float    => simple(:n))  { Float(n) }
    rule(:boolean  => simple(:b))  { b == "true" }
    rule(:datetime => simple(:dt)) { Time.parse dt }
  end
end

The symbol in the simple() match rule defines a local variable within the block for us to manipulate.

Similar to the parser, transforms can be tested on small pieces of the tree.

describe TOML::Transform do
  let(:xform) { TOML::Transform.new }

  context "values" do
    it "transforms an integer value" do
      expect(xform.apply(:integer => "1")).to eq(1)
    end

    it "transforms a float" do
      expect(xform.apply(:float => "0.123")).to eq(0.123)
    end

    it "transforms a boolean" do
      expect(xform.apply(:boolean => "true")).to eq(true)
      expect(xform.apply(:boolean => "false")).to eq(false)
    end

    it "transforms a datetime" do
      expect(xform.apply(:datetime => "1979-05-27T07:32:00Z")).to eq(
        Time.parse("1979-05-27T07:32:00Z"))
    end
  end

end

String transformations have an extra bit of work to do because they contain escaped values that must be unescaped:

rule(:string   => simple(:s)) do
  s.to_s.gsub(/\\[0tnr]/,
              "\\0" => "\0",
              "\\t" => "\t",
              "\\n" => "\n",
              "\\r" => "\r")
end

it "transforms a string" do
  expect(xform.apply(:string => "a string")).to eq("a string")
end

it "unescapes special characters in captured strings" do
  expect(xform.apply(:string => "a\\nb")).to eq("a\nb")
end

When we apply these rules to key/value pairs:

{:key=>"c", :value=>{:integer=>"3"}}

becomes

{:key=>"c", :value=>3}

And an array

:array => [{:integer => "3"}, {:integer => "4"}]

becomes

:array => [3, 4]

Now that we have {:array => [values]}, let’s transform it. Instead of simple(:a), we could use the sequence matcher. However, this will only match arrays of simple values. Specifically, it will match [1, 2, 3] but not [{:a => 1}, {:b => 2}] or [[1,2], [3,4]]. Because arrays can be nested, we have to use the most generic matcher, subtree. Using subtree, we’ll simply pull the matched array out.

rule(:array => subtree(:a)) { a }

it "transforms an array of integers" do
  input = { :array => [ {:integer => "1"}, {:integer => "2"} ] }
  expect( xform.apply(input) ).to eq([1,2])
end

it "transforms nested arrays" do
  input = {
    :array => [
      { :array => [ {:integer => "1"}, {:integer => "2"} ] },
      { :array => [ {:float => "0.1"}, {:float => "0.2"} ] }
    ]
  }
  expect( xform.apply(input) ).to eq([[1,2], [0.1,0.2]])
end

For single assignments, we can convert a {:key => {:value => value}} pair into a bare hash. Again, we’ll use subtree because we need to match both simple values as well as arrays.

rule(:key => simple(:key), :value => subtree(:value)) do
  {key.to_s => value}
end

it "converts a key/value pair into a bare hash" do
  input = {:key => "a key", :value => "a value"}
  expect( xform.apply(input) ).to eq("a key" => "a value")
end

it "converts a key/value pair with an array value" do
  input = {:key => "a key", :value => [[1,2],[3,4]]}
  expect( xform.apply(input) ).to eq("a key" => [[1,2],[3,4]])
end

Now we can transform :assignments => [...]. We’ll do this by rolling up each subsequent key/value hash and merging it in with a context. For bare assignments (that is, not within a key group), that’s simply {}.

First, though, :assignments can match a simple string. This happens when only comments are parsed. We can handle this situation using the simple matcher.

rule(:assignments => simple(:values)) do
  {}
end

For an array of assignments, we’ll merge them together, and for single assignments we can just use the assignment hash directly.

You might notice a different block structure with the rules, this time using a dict block argument rather than implicit local variables. By using an explicit block argument, we allow parslet’s transforms to use helper methods, in this case combine_assignments.

rule(:assignments => subtree(:values)) do |dict|
  if dict[:values].kind_of? Array
    combine_assignments dict[:values]
  else
    dict[:values]
  end
end

def self.combine_assignments(assignments)
  {}.tap do |context|
    assignments.each do |assignment|
      key, value = assignment.first
      context[key.to_s] = value
    end
  end
end

This works as expected:

it "converts a list of global assignments into a hash" do
  input = {:assignments =>
           [{:key => "c", :value => {:integer => "3"}},
            {:key => "d", :value => {:integer => "4"}}]}
  expect(xform.apply(input)).to eq("c" => 3, "d" => 4)
end

it "converts an empty (comments-only) assignments list" do
  input = {:assignments => "\n#comment"}
  expect(xform.apply(input)).to eq({})
end

it "converts an array assignment" do
  input = {:assignments => {:key => "a", :value => [1, 2]}}
  expect( xform.apply(input) ).to eq( "a" => [1,2] )
end

Key groups are hashes with a group name and a list of assignments. The group name defines a potentially nested hash as the context into which the assignments are merged.

Because assignments can be a simple string (just comments), a single assignment, or an array of assignments, we’ll use the simple and subtree matchers again. This time, though, we have to also match against the group name. We’ll define a helper method for building nested hashes, and re-use the combine_assignments helper as well.

# simple case, the values matched were comments/whitespace only
rule(:group_name => simple(:key),
     :assignments => simple(:values)) do |dict|
  nested_hash_from_key dict[:key], {}
end

# the values are a single assignment or a list of assignments
rule(:group_name => simple(:key),
     :assignments => subtree(:values)) do |dict|

  values = if dict[:values].kind_of? Array
             combine_assignments dict[:values]
           else
             dict[:values]
           end
  nested_hash_from_key dict[:key], values
end

def self.nested_hash_from_key(key, values)
  {}.tap do |outer|
    current = outer
    key.to_s.split(".").each do |key_part|
      current[key_part] = {}
      current = current[key_part]
    end
    current.merge! values
  end
end

An outermost :key_group will only have a value, so extract it:

rule(:key_group => subtree(:values)) { values }

Finally, a document will simply be an array of transformed values. Merge them together:

rule(:document => subtree(:values)) do |dict|
  dict[:values].inject(&method(:merge_nested))
end

def self.merge_nested(existing, updates)
  updates.each do |key, value|
    if existing.has_key?(key)
      existing[key] = merge_nested(existing[key], value)
    else
      existing[key] = value
    end
  end
  existing
end

The recursive merge allows nested groups to be merged, e.g. the “servers” section in the TOML example document.

At last, a parsed TOML document is transformed into a useful hash:

it "converts a full TOML doc into a hash" do
  input = TOML::Parser.new.parse(fixture("example.toml"))
  expect(xform.apply(input)).to eq(
    "title" => "TOML Example",
    "owner" => {
      "name" => "Tom Preston-Werner",
      "organization" => "GitHub",
      "bio" => "GitHub Cofounder & CEO\nLikes tater tots and beer.",
      "dob" => Time.parse("1979-05-27 07:32:00 UTC")
    },
    "database" => {
      "server" => "192.168.1.1",
      "ports" => [8001, 8001, 8002],
      "connection_max" => 5000,
      "enabled" => true},
      "servers" => {
        "alpha" => {
          "ip" => "10.0.0.1",
          "dc" => "eqdc10"
        },
        "beta" => {
          "ip" => "10.0.0.2",
          "dc" => "eqdc10"
        }
      },
      "clients" => {
        "data" => [ ["gamma", "delta"], [1, 2] ],
        "hosts" => ["alpha", "omega"]
      }
  )
end

View the source: toml-parslet on GitHub.

Update, March 3

Fixed a bug in this example code where the full TOML doc didn’t correctly merge subsequent key groups.

Next: Displaying Errors in a TOML Document