We use weak parsing algorithms – often hand-written left-leaning recursive descent parsers. Sometimes PEGs. Usually with a lexing layer that treats keywords specially, annotating them as a particular part of speech without that being a function of the grammar, but the words themselves.
This makes writing a parser easy, particularly for those hand-written parsers. Keywords are also a major reason we can’t evolve languages: adding new words breaks old programs that were already using them.
var var =. It’s not ambiguous, since a keyword can’t appear as a variable name, positionally. The first
var can’t be known whether it’s a keyword or variable name without some lookahead, though:
var = would be a variable name and
var foo would be a keyword.
This usually means using better parsers. Hand written parsers could maintain a couple tokens buffered state, allowing an unshift or two to put tokens back when a phrase doesn’t match; generated parsers can do better and use GLR, and a fully dynamic parser working off of the grammar as a data structure can use Earley’s algorithm.
These are problematic for PEGs though. They won’t backtrack and figure out which interpretation is correct. Once a PEG has chosen a part of speech for a word, it sticks. That’s the rationale behind its ordered choice operator: one must have clear precedence. It’s in essence an implicit way to mark which part of speech something is in a grammar.
It’s always tempting to get a ‘clean break’ on a language; misfeatures build up as we evolve it. This is the biggest disservice we can do our users: a clean break breaks every program they have ever written. It’s a new language, and you’re starting fresh.
"use strict" to the function in ES5 was smart, in that it allows us to use the lexical scope as a place where the language changes too.
The complexity with
"use strict" is that it changes things more than lexically: Functions declared in strict mode behave differently, and if you’re clever, you can observe this from the outside, as a caller, and that’s a problem for backward compatibility.
Support multiple sub-languages. In a parser that can support combining grammars (Earley’s algorithm and combinator parsers for pure LL languages in particular are good at this, though PEGs are not). If someone elects a different language within a region of the program, this is possible. Language features can be left as orthogonal layers. How one would express that intent is unexplored, though. Too few people use the tools that would allow this.
We should separate the parser from the semantics of a language: Let there be one, two, even ten versions of the syntax available, and push down to a more easily versioned (or not at all) semantic layer. This is where Python fell down without needing to. The old cruft could have been maintained and reformed in terms of the new concepts from Python3.