Updated Binary Tree Package
Idly looking up a coding problem got me to investigate Dyck Languages, which led me to update my binary tree Go package.
Quite a few coding problems like this odd binary tree representation, or the balanced parentheses, braces and brackets problem deal with “balanced” open and close parentheses, or similar pairs of matching characters. These problems all involve questions about Dyck Languages.
It also turns out that computer science’s trees can be represented textually.
This example binary tree:

can be represented by linear text like this:
(now (is (the) (time)) (for () (all (good))))
These string representations look a whole lot like Dyck Languages, specifically Dyck-1.
Based on my experience parsing Dyck Languages, I decided I had to re-write some code in my binary tree Go package. Specifically, I re-wrote the code that parses string representations of binary trees. This is my third rewrite of parsing binary tree string representations. Apparently I’ve been unhappy with that code, and behavior of the parsing.
The now immediately previous version had some inobvious problems
that caused it to fail to parse some apparently correctly-formatted
string representations.
This was partly due to my desire to allow non-parenthesized immediate children,
something like (a b c) as well as parenthesized immediate children,
as in (a (b) (c)).
It was also partly due to not really understanding the problem.
This time around, I decided that (a b c) is an invalid expression.
Representations of child nodes now must have enclosing parentheses.
This makes sense in that the b and c parts of the expression
are data values of child nodes, not part of the node whose data value is a.
Parentheses surround a node’s string representation,
making (a (b) (c)) valid,
and (a b c) not valid.
Understanding Dyck Languages
by investigating parsing methods for them
gave me a feeling
that I could (once again) rewrite
func GeneralCreateFromString (and helpers) to get more reliable
parsing, and increase chances that I would recognize what the code does
at some later date.
This is an aspect of the engineering method. A scientific or mathematical discovery can replace heuristics, sometimes to great benefit. In this case, I replaced a hacky, token based approach that had problems with extra white space due to the method of tokenization. I replaced iterative child node recognition with a recursive version that only loops over the characters between an opening parenthesis ‘(’ and its matching closing ‘)’. This method feels a lot like the “marking automata” approach to parsing Dyck Languages.
Line count of the code doing the work of parsing string representations of trees stayed almost exactly the same, but got easier to understand. I think it became more robust.
I did increase the overall line count of my
binary tree repo.
I decided that the parsing functions should return errors,
not spill a message on stderr.
Over the years, I’ve come to believe that
only the top-most layer of software should do any error output.
I also added some Go-style unit testing to ensure that func Printf
output can be parsed by the input handling functions.
That’s the subtext of my binary tree coding problem repo.
Multiple “one-offs” to solve coding problems prompted consolidation
so the next coding problem was easier.
Solving further problems added features.
Initial implementations of features demanded clean-up,
consolidation and generalizations.
Generalized features suggested alternative solutions of past problems.
The evolution of parsing string representations of trees runs
in parallel with the evolution
of type Node interface.