Fix interaction of custom operators and infix macros (!302) · Merge requests · sweet-js / sweet-core

Merged Administrator requested to merge github/fork/natefaubion/op-infix into master Apr 26, 2014

Created by: natefaubion

Fixes #298 (closed) plus a few more bugs related to infix matching in various contexts. It's a bit of a doozy, so lets walk through it.

Previously, the special sauce for infix macros and expressions was isolated to the get_expression helper. It did some magic to continually enforest an expression until it reached a fixed point. This is what would load something like foo => foo + 1 with a => lambda macro as a single expression. foo would be recognized as a single expression, but then it would lookahead to see if there was a macro. It would then expand it and see if it extended the original expression. The problem was get_expression was not called in all cases, so you'd get into situations where they wouldn't be expanded (like in binary ops and assignment expressions). The solution was to just switch over to get_expression in these cases, but custom operator put a wrinkle in that.

With fixed binary operators we would just build up BinOp terms as we went, but with custom operators, the term building is deferred so we can work out the custom precedence and associativity. Sub calls to get_expression for the right-hand-side of a binary operator are impossible because you lose the operator context (prec, assoc, stack, etc). We have to persist that context, which means the routine for infix expansion had to be moved into enforest/step instead of get_expression. In general this is good, because that means infix macros are expanded everywhere, which fixed a number of related bugs. It however opened up another set of issues. Take the example on the homepage:

macro unless {
  rule infix { return $value:expr | $guard:expr } => {
    if (!($guard)) {
      return $value;
    }
  }
}

function foo(x) {
  return true unless x > 42;
  return false;
}

This used to work because macro expansion only happened within get_expression. return and the expression would be enforested as separate terms, and the unless infix macro would be run on the next run through enforest. But now that we've shifted everything around, the unless macro is expanded in the same run of enforest as the lhs expression. This means the lhs expression term will not have been finalized and built yet, so it can't match it as an expression.

So now this means we need a way to finalized the previous expression, but only if we try to match on the expression with an infix macro. To do this, I created a new set of Partial term trees (PartialExpression and PartialOperation). While we are building an expression/operator chain, we wrap out previous syntax and terms in these special terms. What makes PartialExpression special is that it has a combine method that knows how to finalize the expression/operator chain its a part of to retrieve the full expression. Now when you try to match on an infix expression, the expander recognizes that its a PartialExpression and calls its combine method. Only operator tokens are wrapped in PartialOperation, so that if you try to match on it directly, it won't be recognized as an expression. The Partial terms also contain references to the Partial syntactically to the left so that we can follow the entire chain and work out where it ends.

There's another issue though that operators and infix macros bring up. Say we are using our unless macro above, but the left hand side expression is a complicated chain of operators 1 * 2 + 3 / 4 that would have several pending combinations on the operator stack. If all we did was splice the syntax, and left the opCtx as is, all those pending combinations would still exist (even though we've consumed them with the infix macro). This means we have to know how much we've matched on, and rewind the operator context to match the syntax we consumed. To do that we add the current PartialOperation term to the opCtx and use it as a marker. When we match on infix syntax in an expression/operator chain, we look at the resulting prevTerms and find the next PartialOperation term. We then search the stack for the matching term and slice it there.

But wait! There's more! It turns out, prevStx and prevTerms weren't actually being built correctly after combining an operation. When the stepper decides that it needs to combine the term, it persists the previous syntax and terms from the closure it was built. But try and follow along to see why this is wrong. Let's take a simple chain of 1 + 2 + 3. Here, the 1 + is on the prevStx when the combine closure is built for the first operator. Once we get to the second +, it combines the first which results in a <BinOp 1 + 2> term, but with the 1 + still on the pevious syntax. When the new term is then added to the previous syntax, you get 1 + 1 + 2. With long chains of operators, you get the syntax duplication gradually building up the longer the chain is. To get the correct prevStx we actually need to know the prevStx from before the 1 + 2, which is not available when we build our combine closure! To get around this, we have to add the prevStx and prevTerms to the opCtx stack so that we can restore it after combining.

At this point, we really need everything in the opCtx on the stack to restore state, so instead of just storing a [prec, combine] tuple, we now just store the entire opCtx object. This is good because we don't need to do as much copying of the object. We can just push and pop off the stack.

In general I've found this to work really well in my testing. There are only a few edge-cases related to splitting terms. For example, so I want to do this:

macro m {
  rule infix { $l:expr + 2 + | } => { $l }
}
1 + 2 + m

It won't work. I'd intuitively expect it to recognize 1 as an expression, but a BinOp term has already been built with 1 + 2. We've already matched the + 2, so if we gave that whole expression it would be wrong. It isn't smart enough to descend into sub-expressions, so it is currently just disallowed and the match is failed.

Fortunately, I can't think of how a use case like that would be useful, so I don't expect it to really come up (famous last words). Otherwise, I think its pretty solid with all tests (plus a few additional) passing. Do try to break it, though.