race on detach with new threads on Windows
#2600 (closed) fixed this race on UNIX. On Windows, however, we cannot use the same approach of a counter because we cannot see all thread creations due to externally injected threads. Windows does have init_apc_go_native, which helps, but it still leaves a race window (one which shows up on the new api.detach_spawn test).
I tried an approach of catching threads in thread init post-detach and sending them native (via a return through the APC hook), which won't work for a shared library detach but I was hoping would work for a static lib. However, the api.detach_spawn test showed a problem: if we've attached to a thread but it hasn't been scheduled yet, it's still pointing at thread_attach_takeover_callee. We can't just return there b/c it's not a hook: it's just a gencode routine. We also can't use the existing AFTER_INTERCEPT_LET_GO_ALT_DYN because it requires a static target and we need a dynamic target for each thread's attach point.