Translation problems in clean call mangling causing post-detach crashes
While testing my new drwrap post-call method for #4197 (closed) I wrote a drwrap detach test and ended up hitting a bunch of failures running it in a loop locally.
The visible symptoms are that right after detach we get the "Cannot correctly handle a received signal.>" fatal error, because a SIGSEGV arrived in a now-native thread for which we have no setup to deliver a signal as we're nearly all the way shut down.
The translate code for clean call mangling is only thinking about synchronous faults, which would only happen on argument setup. At that point, there is indeed a clean app state stored. However, asynchronous translation, such as for detach, could happen anywhere. Plus, there are clean call optimizations that do not store all the state. There is also clean call inlining. It does not look like the translation code is properly considering all of that.
It will take some effort to go through and support all that complexity. As a first step I'm planning to just refuse to translate at all when in clean call mangling. If I put in a change to just refuse to translate in any clean call mangling, it fixes the failures in the drwrap detach test. Although since the args could have a synchronous fault we'll have to translate there, so it will take a little more effort.
I also want to put in some stats on how many asynch translations are refused, to help inform whether we could speed up detach by supporting more translation points.