Annotating a TOML Parse Tree
This is part 2 of 4.
- Part 1 - Parsing TOML in Ruby with Parslet,
- Part 3 - Transforming a TOML Parse Tree
- Part 4 - Displaying Errors in a TOML Document
- toml-parslet on GitHub.
In the previous article, we built a TOML parser. Now that we can parse a document, we need to convert the parse tree into something we can use.
Named Nodes
As defined, the parser simply returns the string that it parsed. This isn’t helpful, so we need to somehow annotate matches during parsing to build a structured representation of a TOML document. Once we’ve creating this tree representation, we’ll be able to transform it into the final data structure that we’re looking for.
We’ll do this using Parslet’s .as
method to name matched parts of rules. When
parslet matches something, it converts it to a hash of {:name => "matched value"}
. These can be nested along with the rules, resulting in a tree of hashes
and arrays.
Values
For values, we capture the type of the value along with its contents.
rule(:integer) do
(str("-").maybe >> match["1-9"] >> digit.repeat).as(:integer)
end
rule(:float) do
(str("-").maybe >> digit.repeat(1) >>
str(".") >> digit.repeat(1)).as(:float)
end
rule(:boolean) do
(str("true") | str("false")).as(:boolean)
end
rule(:datetime) do
(digit.repeat(4) >> str("-") >>
digit.repeat(2) >> str("-") >>
digit.repeat(2) >> str("T") >>
digit.repeat(2) >> str(":") >>
digit.repeat(2) >> str(":") >>
digit.repeat(2) >> str("Z")).as(:datetime)
end
rule(:string) do
str('"') >>
((escaped_special | string_special.absent? >> any).repeat).as(:string) >>
str('"')
end
And the tests:
it "parses integers into {:integer => 'digits'}" do
expect(value_parser.parse("1234")).to eq :integer => "1234"
end
it "parses floats into {:float => 'digits'}" do
expect(value_parser.parse("-0.123")).to eq :float => "-0.123"
end
it "parses booleans into {:boolean => 'value'}" do
expect(value_parser.parse("true")).to eq :boolean => "true"
end
it "parses datetimes into hashes of date/time data" do
expect(value_parser.parse("1979-05-27T07:32:00Z")).to eq(
:datetime => "1979-05-27T07:32:00Z"
)
end
it "parses strings into {:string => 'string contents'}" do
expect(value_parser.parse('"hello world"')).to eq(
:string => "hello world")
end
Arrays
Parslet handles repeated elements “magically”. If there is a sequence of matched values, it will automatically combine them into an array of elements. From the docs, capturing basic repeats can work either way we need it to:
str('a').repeat.as(:b) # "aaa" => {:b=>"aaa"@0}
str('a').as(:b).repeat # "aaa" => [{:b=>"a"@0}, {:b=>"a"@1}, {:b=>"a"@2}]
For arrays, we’ll want to capture the outer array as :array => ...
and the
contents as an array of values, [ {:integer => "1"}, {:integer => "2"}, ...]
.
If we weren’t parsing nested arrays, we could leave off the .as(:array)
and
Parslet would automatically give us bare arrays of values. However, it’s a
little too smart about merging the results of parsed sub-trees and it flattens
nested arrays, so we’ll be explicit.
rule :array do
str("[") >> array_space >>
array_contents.repeat(1).as(:array) >>
array_space >> str("]")
end
it "captures arrays as :array => [ value, value, ... ]" do
expect(array_parser.parse("[1,2]")).to eq(
:array => [ {:integer => "1"}, {:integer => "2"}])
end
it "captures nested arrays" do
expect(array_parser.parse("[ [1,2] ]")).to eq(
:array => [
{:array => [ {:integer => "1"}, {:integer => "2"}]}
])
end
Assignments
We’d like individual assignments to look like {:key => "key", :value => value}
.
The initial version of the parser was a little loose about when and where it would match whitespace, so we’ll refactor the parser rules a bit too.
First, a couple of the helper rules, changing whitespace
to space?
and comment
to comment?
:
rule(:space?) { space.repeat }
rule(:comment?) do
(str("#") >> (newline.absent? >> any).repeat).maybe
end
Next, we’ll look at series of assignments. Originally the assignment rule handled whitespace within itself, but we’ll move that elsewhere. We capture the key and the value:
rule :assignment do
key.as(:key) >>
space? >> str("=") >> space? >>
value.as(:value)
end
it "captures the key and the value" do
expect(ap.parse("thing = 1")).to eq(
:key => "thing", :value => {:integer => "1"})
end
A sequence of assignments can have several forms: nothing (no assignments), a single assignment, a series of comments and whitespace, a series of “bare” assignments, or a series of assignments with comments and whitespace interspersed. To handle this well, we’ll start with a single line:
rule :assignment_line do
space? >> assignment.maybe >> space? >> comment?
end
Now we can easily combine these and capture the overall results as :assignments => ...
:
rule :assignments do
(assignment_line >> (newline >> assignment_line).repeat).as(:assignments)
end
And test these with a variety of inputs:
let(:ap) { parser.assignments }
it "captures a list of assignments" do
expect(ap.parse("a=1\nb=2")).to eq(
:assignments => [
{:key => "a", :value => {:integer => "1"}},
{:key => "b", :value => {:integer => "2"}},
]
)
end
it "captures an empty string" do
expect(ap.parse("")).to eq(:assignments => "")
end
it "captures just comments as a string" do
expect(ap.parse("#comment\n")).to eq(
:assignments => "#comment\n"
)
end
A list of assignments containing just a comment is matched as a string. This is
because we’ve defined a capture, and even if it doesn’t match any structured
{:key => ..., :value => ...
pairs, it matches and captures the string itself.
This is ok, we’ll just have to handle this case when we transform the tree later
on.
But now it looks like we have a problem. If we try and parse the following
string using assignments
:
#comment
a = 1
It’s parsed as :assignments => [{:key => "#comment\na", :value => {:integer => "1"}}]
The key has somehow managed to capture the preceding comment and
newline.
If we look at the rule for an assignment again, it starts with key.as(:key)
.
The way this is invoked from the assignment_line
rule is: space? >> assignment.maybe
. When presented with "#comment\nkey=value"
, the parser
sees the '#'
and interprets it as a key. Because a key is just “not
whitespace”, the remainder of the comment and the newline are accepted.
To fix this, we need to restrict the definition of a key to make sure that it
doesn’t begin with either a #
or a newline:
rule :key do
str("#").absent? >> newline.absent? >>
(match["\\[\\]="].absent? >> space.absent? >> any).repeat(1)
end
And that solves it:
it "captures an assignment after a comment and newlines" do
expect(ap.parse("#comment\na=1")).to eq(
:assignments => [{:key => "a", :value => {:integer => "1"}}]
)
expect(ap.parse("#comment\n\t\n\na=1")).to eq(
:assignments => [{:key => "a", :value => {:integer => "1"}}]
)
end
Key Groups
Finally, key groups. We’ll capture key group names as :group_name => "name"
:
rule :group_name do
space? >> str("[") >>
(str("]").absent? >> any).repeat(1).as(:group_name) >>
str("]") >> space? >> comment?
end
A key group must have a group name, but after that it can be empty or have a series of assignments. We’ve already written that rule, so:
rule :key_group do
(group_name >>
(newline >> assignments).maybe).as(:key_group)
end
let(:kgp) { parser.key_group }
it "captures the group name and assignments" do
expect(kgp.parse("[kg]\na=1\nb=2")).to eq(
:key_group =>
{:group_name => "kg",
:assignments => [
{:key => "a", :value => {:integer => "1"}},
{:key => "b", :value => {:integer => "2"}}]}
)
end
it "captures empty assignments as a string" do
expect(kgp.parse("[kg]\n#comment\n\t\n")).to eq(
:key_group =>
{:group_name => "kg",
:assignments => "#comment\n\t\n"}
)
end
it "captures a single assignment in a key group" do
expect(kgp.parse("[kg]\na=1")).to eq(
:key_group => {
:group_name => "kg",
:assignments => {:key => "a", :value => {:integer => "1"}}}
)
end
Note the final test, where a single assignment is captured. It’s captured as a single hash rather than an array with one item, because only one matched during parsing. Parslet will only create an array of matched items if there are more than one.
Document
A document, as before, is an optional series of assignments, possibly followed
by one or more key groups. We’ll capture the the whole thing as :document
.
rule :document do
((key_group | assignments) >>
key_group.repeat >>
newline.maybe).as(:document)
end
And now, this TOML document:
title = "global title"
[group1]
a = 1
b = 2
[group2]
c = [ 3, 4 ]
is captured as:
{:document=>
[{:assignments=>{:key=>"title", :value=>{:string=>"global title"}}},
{:key_group=>
{:group_name=>"group1",
:assignments=>
[{:key=>"a", :value=>{:integer=>"1"}},
{:key=>"b", :value=>{:integer=>"2"}}]}},
{:key_group=>
{:group_name=>"group2",
:assignments=>{:key=>"c", :value=>{:array=>[{:integer=>"3"}, {:integer=>"4"}]
In part 3 of this series, we’ll transform this tree of captured values into a usable hash using Parslet’s tranformation engine.