Add drwrap call site wrapping feature
drwrap by default handles all the complexities of tailcalls, longjmp, etc. by using a callee and return address wrapping approach. This incurs extra overhead, mostly from flushing when a return address is discovered after it has already been executed (or simultaneously executed in another thread). There are some use cases where missing corner cases of calling conventions is not important and we would prefer to have better performance: we know we're targeting functions that never do anything unusual. This is a feature request to iadd a drwrap mode for this case: wrapping based solely on the call site and ignoring the callee and what actually happens when it returns.