Displaying Errors in a TOML Document
This is part 4 of 4.
- Part 1 - Parsing TOML in Ruby with Parslet,
- Part 2 - Annotating a TOML Parse Tree
- Part 3 - Transforming a TOML Parse Tree
- toml-parslet on GitHub.
The final piece in parsing and transforming a TOML document is handling errors. When something goes wrong, we would like to display where–line and column–the error occurred.
There are two kinds of errors we can encounter in a TOML document: parse errors and transformation errors. Parse errors occur when the input document is syntactically incorrect. Transformation errors are semantic errors when the data contained within the TOML document is invalid in some way, such as a key reassignment.
Parse Errors
Parse errors are the easiest to deal with, as Parslet will already raise an exception if it encounters a parse error.
We’ll start by defining two error classes and a top-level load method on the
TOML
module which will be the entry point to the entire library.
module TOML
Error = Class.new StandardError
ParseError = Class.new Error
TransformError = Class.new Error
def self.load(str)
Transform.new.apply(Parser.new.parse(str))
end
end
Parslet::ParseFailed
errors contain a cause
, which is a tree representing
the parse state at the point at which things failed. The only difficulty is that
the parse error, because it applies to the whole document, returns “line 1,
column 1” as the location of the unexpected input. However, we can write a
helper method to navigate this tree to the depeest point possible to find the
actual position the parser was at when things went wrong:
# Internal: helper for finding the deepest cause for a parse error
def self.deepest_cause(cause)
if cause.children.any?
deepest_cause(cause.children.first)
else
cause
end
end
We apply this by rescuing from Parslet::ParseFailed
, extracting the deepest
cause, and re-raising our own parse error with the information we want:
module TOML
def self.load(str)
Transform.new.apply(Parser.new.parse(str))
rescue Parslet::ParseFailed => e
deepest = deepest_cause e.cause
line, column = deepest.source.line_and_column(deepest.pos)
raise ParseError, "unexpected input at line #{line} column #{column}"
end
end
Transform Errors
Transform errors are a little more complex. We know we have a syntactically valid document, but errors such as key reassignment can still occur at this point in the process.
Currently, in the transform, we’re taking each assignment and applying it to a Hash. We need to know where we are in a document when attempting to reassign a key, but right now keys lack any context.
The transform we’re starting with for assignments is
rule(:key => simple(:key), :value => subtree(:value)) do
{key.to_s => value}
end
Initially, matched values from a Parslet parse are not simple strings, but
instances of Parslet::Slice
. This is a simple wrapper around strings that
contains the line and column where the string was found in the document. The
rule as it stands immediately converts this to a “bare” string, losing the
context.
Instead, let’s hang onto that Parslet::Slice
instance a little longer, and
convert it to a string at the last possible moment: when it’s finally added to
the global hash as we transform the parse tree. That happens in merge_nested
,
so we’ll change merge_nested over to use the string when doing assignments and
comparisons, but otherwise preserve the Parslet::Slice
instance.
rule(:key => simple(:key), :value => subtree(:value)) do
{key => value} # key is still a Parslet::Slice
end
There are two places where we are either using or creating new keys within
hashes: merge_nested
and nested_hash_from_key
.
First, when generating a nested hash from a key group, we’ll take that key, but
because we’re splitting it on '.'
, we need to create new Parslet::Slice
s
from each piece so we continue to preserve location information the whole way
down the nested hash this produces:
def self.nested_hash_from_key(key, values)
key_part, remainder = key.to_s.split(".", 2)
# preserve position information for each part of the key for error
# reporting later on during the transform:
sub_key = Parslet::Slice.new(key_part, key.offset, key.line_cache)
if remainder
rest = Parslet::Slice.new(remainder, key.offset, key.line_cache)
{sub_key => nested_hash_from_key(rest, values)}
else
{sub_key => values}
end
end
Whoops, yes, I did switch the algorithm around a bit on that one so it’s
recursive and not iterative. Perhaps I’ve been reading a bit much about lisp
lately. Anyway, the important thing is that each key part is re-initialized as a
Parslet::Slice
so we can keep its location in the document handy for later on.
We’re already detecting key reassignment errors when merging nested keys. However, the keys are now slices and not just strings. This is the last possible moment in which we’ll convert them back into strings. Also, when an error is detected, we now have the location in the document handy for the error:
def self.merge_nested(existing, updates)
updates.each do |key, value|
key_s = key.to_s
if existing.has_key? key_s
if existing[key_s].kind_of?(Hash) && value.kind_of?(Hash)
existing[key_s] = merge_nested(existing[key_s], value)
else
line, column = key.line_and_column
raise TransformError,
"Cannot reassign existing key #{key_s} at line #{line} column #{column}"
end
else
if value.kind_of? Hash
existing[key_s] = merge_nested({}, value)
else
existing[key_s] = value
end
end
end
existing
end
And now we’re able to not only detect but display helpful error messages when unexpected input or a semantic error is encountered.
This concludes the series on the TOML parser. You can see all the code and read the commits to see how the library evolved on GitHub.