Merged Tim Disney requested to merge redesign into master Aug 02, 2015

Redesign

So I've been working on a rewrite/redesign of sweet.js. Turns out that what we currently have is kinda a rolling collection of hacks that are in desperate need of some rethinking. Things are slow and just getting slower in the branch with ES6 module support so something has to give.

What I'm pushing right now is definitely a work in progress but I think it shows promise. In particular I think it's much more comprehensible for people coming to the codebase for the first time. Read on if you're interested in helping out or just curious as to what might be changing.

Comments/opinions requested.

The current todo status looks something like this:

No more destructuring

One of the big areas of slowdown came from the fact that we weren't doing real parsing. Expansion worked by building up a partial AST (TermTree) and then throwing all that work away by destructuring the partial AST back to an array of tokens and fed that to esprima to actually produce a real AST.

Now instead of doing all the parsing work twice, we just build the complete AST. This is being handled by two data structures, a Term (roughly equivalent to the current TermTree) that acts as a partial AST (some terms hold syntax objects and some hold other terms) and a Node that is just an ESTree representing the complete AST.

A Term has two methods, parse and expand. The parse method returns a new corresponding Node while expand is roughly equivalent to expandToTermTree in the current expander (ie handles some hygiene details and walks down partially expanded Terms).

The final Node is shipped directly to babel (via transform.fromAST) because ES6. All sweet.js code you write is now ES6 (or at least as much as babel can support).

Recursive descent enforest

The old enforest was weird and complicated and basically a giant if block. The new enforest is still weird and complicated but at least it's a bit more modular.

Immutable.js

Currently the expansion algorithm is written as if we were using lists when in fact it's arrays all the way down. Lot's of calls to concat that should not be happening if we care about performance.

Now we're using immutable.js lists of syntax objects which should be better.

New syntax transformer type

There's a new syntax declaration form (analogous to var/let/const for compiletime values) that looks like syntax <id> = <expr>. Previously sweet supported a couple different primitive macro forms but you could only really put macros into the compiletime environment. This new form allows you to put whatever you want into the env.

Normally (e.g. Racket, old sweet) syntax transformers (aka macros) are just functions but I'm changing things up. Macros are actually objects with two methods match and transform.

syntax m = {
    // List[Syntax] -> {subst: Substitution, 
    //                  rest: List[Syntax] }
    match: function(stxl) {
        return {
            subst: [],
            rest: stxl.rest()
        };
    },
    // Substitution -> List[Syntax]
    transform: function(subst) {
        return syntaxQuote { 42 };
    }
};

The reason for breaking matching and transforming out into two functions is hygiene. Currently sweet has to pass some hygiene information to primitive macros so that they can mark syntax that they match and their result syntax. This is gross and dangerous; badly behaved primitive macros can mess us hygiene in various ways. By splitting macros into two functions we can pull the hygiene manipulation code back into the expander.

More details to work out here but I think this is the right factoring.

This of course is just the primitive form, declarative rule can case macro forms will be built on top of this.

No multi-tokens

Right now you can do things like macro (number?) { ... } to create multi-token macros. This massively complicates the expander for not much gain. Hacking the lexical structure of your language can be done with readtables (thanks @jlongster!) so let's do that instead.

Limit infix macros

Infix macros are cool but maybe too cool. The enforestation of operators is massively complicated because we want to allow infix macros to be very flexible. Some heroic work was done by @natefaubion here but I think we are both of the opinion it's not actually worth it.

I think we can still have them but just in a restricted state. My proposal is that they can only match on previously seen Terms and operators create implicit delimiters that infix macros can't "see" out of (so in 2 + inf 42 the inf macro sees an empty prefix list). Just my initial intuition, more details to work though.