HANG re-takeover of in-transition thread

I hit this situation while testing my #2601 (closed) fix: the thread_initexit_lock unowned but contended, all threads waiting.

Detacher is pre-synchall trying to get lock. Tons of new threads waiting on lock to init or threads waiting on the lock to exit. Not clear how it got into this state: mutex_unlock() should call mutex_notify_released_lock() which sets lock_requests to -1.

(gdb) p uninit_thread_count
$1 = 43
(gdb) p thread_initexit_lock
$2 = {
  lock_requests = 1, 
  contended_event = -1, 
  name = 0x7132dec0 "thread_initexit_lock(mutex)@core/dynamo.c:262", 
  rank = 4, 
  owner = 0, 
  owning_dcontext = 0x0, 
  prev_owned_lock = 0x716118c0 <outermost_lock>, 
  count_times_acquired = 26, 
  count_times_contended = 110, 
  count_times_spin_pause = 0, 
  max_contended_requests = 1, 
  count_times_spin_only = 49, 
  prev_process_lock = 0x71611440 <shared_cache_flush_lock>, 
  next_process_lock = 0x71611240 <all_threads_lock>, 
  callstack = {0x0, 0x0, 0x0, 0x0}, 
  app_lock = false, 
  deleted = false
}

Fairly reproducible, even w/ diagnostics:

$ LD_LIBRARY_PATH=lib64/debug suite/tests/bin/api.detach_spawn
ninja: no work to do.
.....................................................................................................................<Starting application suite/tests/bin/api.detach_spawn (6526)>
.............................................................................................................................................................<Initial options = >
..............lock T6526
.....................................................................................unlock T6526
..............................<(1+x) Handling our fault in a TRY at 0x00000000712c5e42>
.........................................................................................................lock T6526
...........unlock T6526
lock T6527
lock T6543
lock T6532
lock T6531
lock T6538
lock T6560
lock T6528
lock T6529
lock T6550
lock T6530
unlock T6543
releasing T6543
unlock T6527
releasing T6527
unlock T6538
releasing T6538
unlock T6532
releasing T6532
unlock T6560
releasing T6560
unlock T6531
releasing T6531
unlock T6529
releasing T6529
unlock T6528
releasing T6528
unlock T6550
releasing T6550
unlock T6530
releasing T6530
<thread exited while attaching>
lock T6526
unlock T6526
lock T6526
unlock T6526
<get_memory_info mismatch! (can happen if os combines entries in /proc/pid/maps)
	os says: 0x00007f5dc27f6000-0x00007f5dc37f7000 prot=0x00000003
	cache says: 0x00007f5dc2ff6000-0x00007f5dc2ff7000 prot=0x00000000
>
lock T7049
unlock T7049
lock T6526
unlock T6526
lock T6526
unlock T6526
<Attached to 12 threads in application /home/bruening/dr/git/build_x64_dbg_tests/suite/tests/bin/api.detach_spawn (6526)>
..lock T7050
lock T7051
lock T7052
unlock T7050
releasing T7050
unlock T7051
releasing T7051
.unlock T7052
releasing T7052
lock T7053
unlock T7053
lock T7054
unlock T7054
..lock T7050
unlock T7050
lock T7053
unlock T7053
lock T7054
lock T7052
unlock T7054
unlock T7052
lock T7055
unlock T7055
.lock T7055
unlock T7055
lock T7051
unlock T7051
.lock T7056
unlock T7056
.lock T7056
unlock T7056
<----------------- seems even up to this point --------------->
lock T7049
lock T7057
unlock T7057
lock T7049  <---- double lock!!
lock T7057
lock T7058
....lock T7059
<Detaching from application suite/tests/bin/api.detach_spawn (6526)>
lock T6526
........lock T7060
lock T7061
lock T7062
lock T7063
lock T7064
lock T7065
lock T7066
lock T7067
......lock T7068
lock T7069
.lock T7070
lock T7071
lock T7072
lock T7073
..........lock T7074
lock T7075
lock T7076
lock T7077
lock T7080
lock T7078
lock T7079
lock T7081
lock T7082
lock T7083
lock T7084
.lock T7085
lock T7086
....lock T7087
lock T7088
...................lock T7090
lock T7089
lock T7091
lock T7092
lock T7093
lock T7094
lock T7095
lock T7096
lock T7097
lock T7098
lock T7099
lock T7100
lock T7101
lock T7102
lock T7104
lock T7103
lock T7105
lock T7106
lock T7107
lock T7109
lock T7108
lock T7110
lock T7111
lock T7112
lock T7113
lock T7115
lock T7114
lock T7116
lock T7117

So we have a double-lock of T7049. Currently it's handling a signal, but it arrived while in the cache, not while holding the lock. The signal is SIGSEGV: hmm, suspicious, it looks like self-interp:

(gdb) x/8i ucxt->uc_mcontext.rip-20
   0x4b7ed8fc:	add    %al,(%rax)
   0x4b7ed8fe:	add    %al,(%rax)
   0x4b7ed900:	pushq  $0x712df29f
   0x4b7ed905:	movabs %gs:0x88,%rax
   0x4b7ed910:	mov    0x68(,%rax,1),%eax
   0x4b7ed917:	mov    %rcx,%gs:0x10
   0x4b7ed920:	pop    %rcx
   0x4b7ed921:	jmpq   0x4b7e1500
(gdb) x/4i 0x712df29f-5
   0x712df29a <is_thread_tls_initialized+62>:	callq  0x712c5e52 <safe_read_tls_magic>
   0x712df29f <is_thread_tls_initialized+67>:	cmp    $0x244f4952,%eax
(gdb) dps ucxt->uc_mcontext.rsp ucxt->uc_mcontext.rsp+300
0x000000004ba52428  0x00000000712df29f  is_thread_tls_initialized + 67 in section .text oflib64/debug/libdynamorio.so
 <...>
0x000000004ba52468  0x00000000712e18da  get_tls_thread_id + 13 in section .text of lib64/debug/libdynamorio.so
<...>
0x000000004ba52488  0x00000000712e18b8  get_thread_id + 13 in section .text of lib64/debug/libdynamorio.so
<...>
0x000000004ba524a8  0x000000007111ebc5  deadlock_avoidance_lock + 474 in section .text of lib64/debug/libdynamorio.so
<...>
0x000000004ba52508  0x000000007111f705  mutex_trylock + 106 in section .text of lib64/debug/libdynamorio.so
<...>
0x000000004ba52518  0x00000000716112c0  thread_initexit_lock in section .cspdata of lib64/debug/libdynamorio.so
<...>
0x000000004ba52538  0x000000007111f5ed  mutex_lock + 132 in section .text of lib64/debug/libdynamorio.so
<...>
0x000000004ba52568  0x00000000710958d3  dynamo_thread_init + 255 in section .text of lib64/debug/libdynamorio.so

  next_tag = 0x713234f6 <syscall_ready+5> "[\303\333\343h\200\037", 
  last_exit = 0x71349040 <linkstub_starting>, 
(gdb) dps dcontext->upcontext.upcontext.mcontext.rsp dcontext->upcontext.upcontext.mcontext.rsp+500
0x000000004f824500  0x00007f8236ffd700  No symbol matches (void *)$retaddr.
0x000000004f824508  0x00000000712e352f  os_thread_sleep + 600 in section .text of lib64/debug/libdynamorio.so
<...>
0x000000004f824568  0x00000000710958a6  dynamo_thread_init + 210 in section .text of lib64/debug/libdynamorio.so

OK so we attach to a thread and it got our takeover signal and is at the sleep I added. We then loop around in dynamorio_take_over_threads() and re-takeover b/c it's not yet in all_threads! This could happen w/o the sleep.