Redesign
So I've been working on a rewrite/redesign of sweet.js. Turns out that what we currently have is kinda a rolling collection of hacks that are in desperate need of some rethinking. Things are slow and just getting slower in the branch with ES6 module support so something has to give.
What I'm pushing right now is definitely a work in progress but I think it shows promise. In particular I think it's much more comprehensible for people coming to the codebase for the first time. Read on if you're interested in helping out or just curious as to what might be changing.
Comments/opinions requested.
The current todo status looks something like this:
-
add support for all forms (right now I have some of the obvious ones but switch
,yield
and lots more aren't supported) -
add hygiene (actually should be straightforward, hooks are already in the right places) -
add declarative macros ( rule
andcase
, currently just primitive macros "work") -
add module support (not straightforward but doable) -
add infix macros -
add custom operators -
spec out multi-token equivalent -
add line number and sourcemap support -
test each syntactic form -
port old tests -
add perf benchmarks -
add descriptive error messages
No more destructuring
One of the big areas of slowdown came from the fact that we weren't doing real parsing. Expansion worked by building up a partial AST (TermTree
) and then throwing all that work away by destructuring the partial AST back to an array of tokens and fed that to esprima to actually produce a real AST.
Now instead of doing all the parsing work twice, we just build the complete AST. This is being handled by two data structures, a Term
(roughly equivalent to the current TermTree
) that acts as a partial AST (some terms hold syntax objects and some hold other terms) and a Node
that is just an ESTree representing the complete AST.
A Term
has two methods, parse
and expand
. The parse
method returns a new corresponding Node
while expand
is roughly equivalent to expandToTermTree
in the current expander (ie handles some hygiene details and walks down partially expanded Term
s).
The final Node
is shipped directly to babel (via transform.fromAST
) because ES6. All sweet.js code you write is now ES6 (or at least as much as babel can support).
Recursive descent enforest
The old enforest
was weird and complicated and basically a giant if
block. The new enforest
is still weird and complicated but at least it's a bit more modular.
Immutable.js
Currently the expansion algorithm is written as if we were using lists when in fact it's arrays all the way down. Lot's of calls to concat
that should not be happening if we care about performance.
Now we're using immutable.js lists of syntax objects which should be better.
New syntax transformer type
There's a new syntax declaration form (analogous to var
/let
/const
for compiletime values) that looks like syntax <id> = <expr>
. Previously sweet supported a couple different primitive macro
forms but you could only really put macros into the compiletime environment. This new form allows you to put whatever you want into the env.
Normally (e.g. Racket, old sweet) syntax transformers (aka macros) are just functions but I'm changing things up. Macros are actually objects with two methods match
and transform
.
syntax m = {
// List[Syntax] -> {subst: Substitution,
// rest: List[Syntax] }
match: function(stxl) {
return {
subst: [],
rest: stxl.rest()
};
},
// Substitution -> List[Syntax]
transform: function(subst) {
return syntaxQuote { 42 };
}
};
The reason for breaking matching and transforming out into two functions is hygiene. Currently sweet has to pass some hygiene information to primitive macros so that they can mark syntax that they match and their result syntax. This is gross and dangerous; badly behaved primitive macros can mess us hygiene in various ways. By splitting macros into two functions we can pull the hygiene manipulation code back into the expander.
More details to work out here but I think this is the right factoring.
This of course is just the primitive form, declarative rule
can case
macro forms will be built on top of this.
No multi-tokens
Right now you can do things like macro (number?) { ... }
to create multi-token macros. This massively complicates the expander for not much gain. Hacking the lexical structure of your language can be done with readtables (thanks @jlongster!) so let's do that instead.
Limit infix macros
Infix macros are cool but maybe too cool. The enforestation of operators is massively complicated because we want to allow infix macros to be very flexible. Some heroic work was done by @natefaubion here but I think we are both of the opinion it's not actually worth it.
I think we can still have them but just in a restricted state. My proposal is that they can only match on previously seen Term
s and operators create implicit delimiters that infix macros can't "see" out of (so in 2 + inf 42
the inf
macro sees an empty prefix list). Just my initial intuition, more details to work though.