Add drwrap mode with indirect handling but no expensive retaddr checking or flushing

Today's drwrap instruments the callee entry point and examines the return address to find the post-call instrumentation point, recording it and flushing if necessary for instrumentation. However, flushing is very expensive, and we have found that even the check of the return address is too expensive as it uses shared data structures and locks.

One potential solution is call-site-only instrumentation #4070 (closed) but that misses too many post-calls.

Our CGO 2011 paper listed 4 approaches:

Add an extra frame on the app call stack (what Valgrind does; what Pin used to do): not transparent but might be fine for our apps
Find all the RET instructions in the callee (what Pin does nowadays): suffers from complexity of static analysis of hot/cold split function walking
Call site interception (#4070 (closed)): suffers from the many cases of statically indeterminable call targets and tailcalls
Dynamically find return addresses from inside the callee (what DR does today).

Adding another one:

Change the return address to a sentinel/takeover trampoline. This also has transparency implications and can (and does) break applications, which look at or modify their own return addresses on occasion. Such things are normally only done as a last resort when you're not in control. But it might work fine here. It could be done purely in a client: use a legitimate address; have a high-priority bb event look for this address and use dr_redirect_native_target() for an app-instr-less transfer (dr_redirect_execution() is not supported in such a situation; the native redirect is better). It would end up as a DR-trace barrier.

For target applications that are relatively well-behaved, the non-transparent solutions 1 and 4 may well be the most effective in terms of performance and completeness. This issue covers implementing one of those as an optional feature in drwrap.