Transforming a TOML Parse Tree
This is part 3 of 4.
- Part 1 - Parsing TOML in Ruby with Parslet,
- Part 2 - Annotating a TOML Parse Tree
- Part 4 - Displaying Errors in a TOML Document
- toml-parslet on GitHub.
Transforms
Now that our parser is giving us an annotated parse tree, we need to transform it into something usable.
Parslet’s Transform
class allows us to define rules to match parts of a
captured tree and change them into whatever structure we need. That could be an
abstract syntax tree, or, in the case of a TOML document, it should result in a
hash.
The transformer matches and applies each rule in the order it was defined, starting at the leaves and then working its way up.
A transform rule takes a pattern–an exact hash and value to match–and a block
of code which executes on the matched pattern. For values, we’ll use the
simple
matcher, which only matches strings, numbers, and the like. simple
will not match arrays or hashes.
We’ll start by looking at the overall structure of a simple parsed TOML document.
{:document=>
[{:assignments=>{:key=>"title", :value=>{:string=>"global title"}}},
{:key_group=>
{:group_name=>"group1",
:assignments=>
[{:key=>"a", :value=>{:integer=>"1"}},
{:key=>"b", :value=>{:integer=>"2"}}]}},
{:key_group=>
{:group_name=>"group2",
:assignments=>{:key=>"c", :value=>{
:array=>[{:integer=>"3"}, {:integer=>"4"}]
The leaves in this tree are values. We’ll start by defining transformation rules for them:
module TOML
class Transform < Parslet::Transform
rule(:integer => simple(:n)) { Integer(n) }
rule(:float => simple(:n)) { Float(n) }
rule(:boolean => simple(:b)) { b == "true" }
rule(:datetime => simple(:dt)) { Time.parse dt }
end
end
The symbol in the simple()
match rule defines a local variable within the
block for us to manipulate.
Similar to the parser, transforms can be tested on small pieces of the tree.
describe TOML::Transform do
let(:xform) { TOML::Transform.new }
context "values" do
it "transforms an integer value" do
expect(xform.apply(:integer => "1")).to eq(1)
end
it "transforms a float" do
expect(xform.apply(:float => "0.123")).to eq(0.123)
end
it "transforms a boolean" do
expect(xform.apply(:boolean => "true")).to eq(true)
expect(xform.apply(:boolean => "false")).to eq(false)
end
it "transforms a datetime" do
expect(xform.apply(:datetime => "1979-05-27T07:32:00Z")).to eq(
Time.parse("1979-05-27T07:32:00Z"))
end
end
end
String transformations have an extra bit of work to do because they contain escaped values that must be unescaped:
rule(:string => simple(:s)) do
s.to_s.gsub(/\\[0tnr]/,
"\\0" => "\0",
"\\t" => "\t",
"\\n" => "\n",
"\\r" => "\r")
end
it "transforms a string" do
expect(xform.apply(:string => "a string")).to eq("a string")
end
it "unescapes special characters in captured strings" do
expect(xform.apply(:string => "a\\nb")).to eq("a\nb")
end
When we apply these rules to key/value pairs:
{:key=>"c", :value=>{:integer=>"3"}}
becomes
{:key=>"c", :value=>3}
And an array
:array => [{:integer => "3"}, {:integer => "4"}]
becomes
:array => [3, 4]
Now that we have {:array => [values]}
, let’s transform it. Instead of
simple(:a)
, we could use the sequence
matcher. However, this will only match
arrays of simple values. Specifically, it will match [1, 2, 3]
but not [{:a => 1}, {:b => 2}]
or [[1,2], [3,4]]
. Because arrays can be nested, we have to
use the most generic matcher, subtree
. Using subtree
, we’ll simply pull the
matched array out.
rule(:array => subtree(:a)) { a }
it "transforms an array of integers" do
input = { :array => [ {:integer => "1"}, {:integer => "2"} ] }
expect( xform.apply(input) ).to eq([1,2])
end
it "transforms nested arrays" do
input = {
:array => [
{ :array => [ {:integer => "1"}, {:integer => "2"} ] },
{ :array => [ {:float => "0.1"}, {:float => "0.2"} ] }
]
}
expect( xform.apply(input) ).to eq([[1,2], [0.1,0.2]])
end
For single assignments, we can convert a {:key => {:value => value}}
pair into
a bare hash. Again, we’ll use subtree
because we need to match both simple
values as well as arrays.
rule(:key => simple(:key), :value => subtree(:value)) do
{key.to_s => value}
end
it "converts a key/value pair into a bare hash" do
input = {:key => "a key", :value => "a value"}
expect( xform.apply(input) ).to eq("a key" => "a value")
end
it "converts a key/value pair with an array value" do
input = {:key => "a key", :value => [[1,2],[3,4]]}
expect( xform.apply(input) ).to eq("a key" => [[1,2],[3,4]])
end
Now we can transform :assignments => [...]
. We’ll do this by rolling up each
subsequent key/value hash and merging it in with a context. For bare assignments
(that is, not within a key group), that’s simply {}
.
First, though, :assignments
can match a simple string. This happens when only
comments are parsed. We can handle this situation using the simple
matcher.
rule(:assignments => simple(:values)) do
{}
end
For an array of assignments, we’ll merge them together, and for single assignments we can just use the assignment hash directly.
You might notice a different block structure with the rules, this time using a
dict
block argument rather than implicit local variables. By using an explicit
block argument, we allow parslet’s transforms to use helper methods, in this
case combine_assignments
.
rule(:assignments => subtree(:values)) do |dict|
if dict[:values].kind_of? Array
combine_assignments dict[:values]
else
dict[:values]
end
end
def self.combine_assignments(assignments)
{}.tap do |context|
assignments.each do |assignment|
key, value = assignment.first
context[key.to_s] = value
end
end
end
This works as expected:
it "converts a list of global assignments into a hash" do
input = {:assignments =>
[{:key => "c", :value => {:integer => "3"}},
{:key => "d", :value => {:integer => "4"}}]}
expect(xform.apply(input)).to eq("c" => 3, "d" => 4)
end
it "converts an empty (comments-only) assignments list" do
input = {:assignments => "\n#comment"}
expect(xform.apply(input)).to eq({})
end
it "converts an array assignment" do
input = {:assignments => {:key => "a", :value => [1, 2]}}
expect( xform.apply(input) ).to eq( "a" => [1,2] )
end
Key groups are hashes with a group name and a list of assignments. The group name defines a potentially nested hash as the context into which the assignments are merged.
Because assignments can be a simple string (just comments), a single assignment,
or an array of assignments, we’ll use the simple
and subtree
matchers again.
This time, though, we have to also match against the group name. We’ll define a
helper method for building nested hashes, and re-use the combine_assignments
helper as well.
# simple case, the values matched were comments/whitespace only
rule(:group_name => simple(:key),
:assignments => simple(:values)) do |dict|
nested_hash_from_key dict[:key], {}
end
# the values are a single assignment or a list of assignments
rule(:group_name => simple(:key),
:assignments => subtree(:values)) do |dict|
values = if dict[:values].kind_of? Array
combine_assignments dict[:values]
else
dict[:values]
end
nested_hash_from_key dict[:key], values
end
def self.nested_hash_from_key(key, values)
{}.tap do |outer|
current = outer
key.to_s.split(".").each do |key_part|
current[key_part] = {}
current = current[key_part]
end
current.merge! values
end
end
An outermost :key_group
will only have a value, so extract it:
rule(:key_group => subtree(:values)) { values }
Finally, a document will simply be an array of transformed values. Merge them together:
rule(:document => subtree(:values)) do |dict|
dict[:values].inject(&method(:merge_nested))
end
def self.merge_nested(existing, updates)
updates.each do |key, value|
if existing.has_key?(key)
existing[key] = merge_nested(existing[key], value)
else
existing[key] = value
end
end
existing
end
The recursive merge allows nested groups to be merged, e.g. the “servers” section in the TOML example document.
At last, a parsed TOML document is transformed into a useful hash:
it "converts a full TOML doc into a hash" do
input = TOML::Parser.new.parse(fixture("example.toml"))
expect(xform.apply(input)).to eq(
"title" => "TOML Example",
"owner" => {
"name" => "Tom Preston-Werner",
"organization" => "GitHub",
"bio" => "GitHub Cofounder & CEO\nLikes tater tots and beer.",
"dob" => Time.parse("1979-05-27 07:32:00 UTC")
},
"database" => {
"server" => "192.168.1.1",
"ports" => [8001, 8001, 8002],
"connection_max" => 5000,
"enabled" => true},
"servers" => {
"alpha" => {
"ip" => "10.0.0.1",
"dc" => "eqdc10"
},
"beta" => {
"ip" => "10.0.0.2",
"dc" => "eqdc10"
}
},
"clients" => {
"data" => [ ["gamma", "delta"], [1, 2] ],
"hosts" => ["alpha", "omega"]
}
)
end
View the source: toml-parslet on GitHub.
Update, March 3
Fixed a bug in this example code where the full TOML doc didn’t correctly merge subsequent key groups.