zerowidth positive lookahead

Displaying Errors in a TOML Document

This is part 4 of 4.

The final piece in parsing and transforming a TOML document is handling errors. When something goes wrong, we would like to display where–line and column–the error occurred.

There are two kinds of errors we can encounter in a TOML document: parse errors and transformation errors. Parse errors occur when the input document is syntactically incorrect. Transformation errors are semantic errors when the data contained within the TOML document is invalid in some way, such as a key reassignment.

Parse Errors

Parse errors are the easiest to deal with, as Parslet will already raise an exception if it encounters a parse error.

We’ll start by defining two error classes and a top-level load method on the TOML module which will be the entry point to the entire library.

module TOML
  Error          = Class.new StandardError
  ParseError     = Class.new Error
  TransformError = Class.new Error

  def self.load(str)
    Transform.new.apply(Parser.new.parse(str))
  end
end

Parslet::ParseFailed errors contain a cause, which is a tree representing the parse state at the point at which things failed. The only difficulty is that the parse error, because it applies to the whole document, returns “line 1, column 1” as the location of the unexpected input. However, we can write a helper method to navigate this tree to the depeest point possible to find the actual position the parser was at when things went wrong:

# Internal: helper for finding the deepest cause for a parse error
def self.deepest_cause(cause)
  if cause.children.any?
    deepest_cause(cause.children.first)
  else
    cause
  end
end

We apply this by rescuing from Parslet::ParseFailed, extracting the deepest cause, and re-raising our own parse error with the information we want:

module TOML
  def self.load(str)
    Transform.new.apply(Parser.new.parse(str))
  rescue Parslet::ParseFailed => e
    deepest = deepest_cause e.cause
    line, column = deepest.source.line_and_column(deepest.pos)
    raise ParseError, "unexpected input at line #{line} column #{column}"
  end
end

Transform Errors

Transform errors are a little more complex. We know we have a syntactically valid document, but errors such as key reassignment can still occur at this point in the process.

Currently, in the transform, we’re taking each assignment and applying it to a Hash. We need to know where we are in a document when attempting to reassign a key, but right now keys lack any context.

The transform we’re starting with for assignments is

rule(:key => simple(:key), :value => subtree(:value)) do
  {key.to_s => value}
end

Initially, matched values from a Parslet parse are not simple strings, but instances of Parslet::Slice. This is a simple wrapper around strings that contains the line and column where the string was found in the document. The rule as it stands immediately converts this to a “bare” string, losing the context.

Instead, let’s hang onto that Parslet::Slice instance a little longer, and convert it to a string at the last possible moment: when it’s finally added to the global hash as we transform the parse tree. That happens in merge_nested, so we’ll change merge_nested over to use the string when doing assignments and comparisons, but otherwise preserve the Parslet::Slice instance.

rule(:key => simple(:key), :value => subtree(:value)) do
  {key => value} # key is still a Parslet::Slice
end

There are two places where we are either using or creating new keys within hashes: merge_nested and nested_hash_from_key.

First, when generating a nested hash from a key group, we’ll take that key, but because we’re splitting it on '.', we need to create new Parslet::Slices from each piece so we continue to preserve location information the whole way down the nested hash this produces:

def self.nested_hash_from_key(key, values)
  key_part, remainder = key.to_s.split(".", 2)

  # preserve position information for each part of the key for error
  # reporting later on during the transform:
  sub_key = Parslet::Slice.new(key_part, key.offset, key.line_cache)

  if remainder
    rest = Parslet::Slice.new(remainder, key.offset, key.line_cache)
    {sub_key => nested_hash_from_key(rest, values)}
  else
    {sub_key => values}
  end
end

Whoops, yes, I did switch the algorithm around a bit on that one so it’s recursive and not iterative. Perhaps I’ve been reading a bit much about lisp lately. Anyway, the important thing is that each key part is re-initialized as a Parslet::Slice so we can keep its location in the document handy for later on.

We’re already detecting key reassignment errors when merging nested keys. However, the keys are now slices and not just strings. This is the last possible moment in which we’ll convert them back into strings. Also, when an error is detected, we now have the location in the document handy for the error:

def self.merge_nested(existing, updates)
  updates.each do |key, value|
    key_s = key.to_s

    if existing.has_key? key_s
      if existing[key_s].kind_of?(Hash) && value.kind_of?(Hash)
        existing[key_s] = merge_nested(existing[key_s], value)
      else
        line, column = key.line_and_column
        raise TransformError,
          "Cannot reassign existing key #{key_s} at line #{line} column #{column}"
      end
    else
      if value.kind_of? Hash
        existing[key_s] = merge_nested({}, value)
      else
        existing[key_s] = value
      end
    end
  end

  existing
end

And now we’re able to not only detect but display helpful error messages when unexpected input or a semantic error is encountered.

This concludes the series on the TOML parser. You can see all the code and read the commits to see how the library evolved on GitHub.