Thurston claims that no previous grammar system supports his three requirements of generalized parsing, grammar-dependent scanning, and context-dependent parsing.
I would argue that Prolog Definite Clause Grammars, which date back to the early 1970s, have all three of these properties. Furthermore, since the context is maintained functionally, by threading additional values through the productions, no "undo actions" are required; Prolog's built-in backtracking is all that's needed.
Of course, the problem with DCGs is performance: they're exponential in the worst case. But I think they deserve mention in a dissertation like this anyway. Also, any backtracking parser risks exponential worst-case performance; it will be interesting to see how Colm avoids this fate (I've only read the first few pages yet).
I remember, back in 1985, trying to write a parser in Prolog on a VAX 11/750. I didn't know what I was doing and my code fell into the exponential case trap. Even minimalist examples appeared to crash as they took unreasonably long, so I wasn't getting clues to the problem, and just gave up.
Consequently I want to rephrase the first sentence of your third paragraph. The problem with DCGs is performance: they're exponential in the naive case. That might not sound too bad today, but back in 1985 the hype was that Prolog let you program declaratively. Just declare what a parse looks like. Reasonable performance in the naive case was the popular selling point for Prolog.
Note that threading the context through the parse tree while maintaining fully generalized parsing requires keeping all versions of the parsing context in memory. Consider making a C++ parser in that way ... ie every time you modify the structures you build in memory you make a copy of them first.
If you know that every subfield of your state object is immutable, and you're using references instead of "local" copies (inconvenient unless you have garbage collection) then your copying costs are limited to the size of the overall state object plus whichever field changed.
Obviously this makes no sense for C++, but Clojure and OCaml get away with it and in Haskell it's the standard way of implementing almost every stateful computation.
Indeed I did not follow what you meant. I missed the "persistent" and saw ... "don't incur the cost of copying." I took that to mean that you were referring to not maintaining any history and using a single, mutable global state. It is with that understanding that I made the previous comment.
To address what you meant ... it's more a matter of what's practical. For example, to parse C++ one needs to do quite a bit of semantic analysis during parsing. A full implementation of the type system is necessary. Then the data structures change with every statement/clause parsed. Achieving this with persistent data structures is far too costly.
To have both generalized parsing and context dependent parsing you need to somehow revert changes to the global state when you backtrack. You can do this by restoring it to old versions, which requires copying the state and keeping the history. The Colm approach is to keep only one global state, but store instructions for programmatically reverting the state while you backtrack.
The point of functional data structures is that if you modify, you don't copy the whole structure, you only copy the path to the things that changed.
A simple example would be parsing a sequence of characters and storing that in a linked list. When you get an additional character you don't destructively modify the list, instead you create a new list node that points to the existing list.
The process function takes a character and a state, and returns a new state. The old state is still usable, as nothing was mutated. On the other hand no large data structure had to be copied.
Your method is probably more efficient. Because you don't need access to all historical versions simultaneously, you only have to keep a stack of undo operations. So you only have O(1) slowdown per operation. With functional data structures you have in the worst case a O(log n) slowdown per operation.
For example if you implement a bit vector, then update and lookup can both be O(log n), e.g. a red-black tree. If you want update to be O(1) then lookup becomes O(n), e.g. a linked list. If you want lookup to be O(1) then update becomes O(n), e.g. copying the entire vector on update.
(please correct me if I'm wrong and if you can do better than O(n) for those cases)
Yes I understand these techniques, in fact they are heavily in use in Colm, just not for maintaining the global state that is used for the parsing feedback loop.
I use the term "copy" in the general sense to make discussion easier. Conceptually, persistent data structures are a copy, even if they are optimized heavily to incur very little cost.
I would argue that Prolog Definite Clause Grammars, which date back to the early 1970s, have all three of these properties. Furthermore, since the context is maintained functionally, by threading additional values through the productions, no "undo actions" are required; Prolog's built-in backtracking is all that's needed.
Of course, the problem with DCGs is performance: they're exponential in the worst case. But I think they deserve mention in a dissertation like this anyway. Also, any backtracking parser risks exponential worst-case performance; it will be interesting to see how Colm avoids this fate (I've only read the first few pages yet).