DEADLOCK when nudge sent: thread_initexit_lock has no owner but lock_requests==4
The client.nudge_test test on 32-bit seems to hang quite a bit. I reproduced one such hang here. Others were from asserts (#4074 (closed)) and the test harness not handling app exits from asserts.
Running the client.nudge_test command:
10:26 PM ~/dr/git/build_x86_dbg_tests
% "C:/Program Files/CMake/bin/cmake.exe" -D toolbindir=D:/derek/dr/git/build_x86_dbg_tests/bin32 -D cmd="D:/derek/dr/git/build_x86_dbg_tests/bin32/run_in_bg@-out@D:/derek/dr/git/build_x86_dbg_tests/suite/tests/client.nudge_test-out@D:/derek/dr/git/build_x86_dbg_tests/bin32/drrun.exe@-pidfile@D:/derek/dr/git/build_x86_dbg_tests/suite/tests/client.nudge_test-pid@-quiet@-debug@-use_dll@D:/derek/dr/git/build_x86_dbg_tests/lib32/debug/dynamorio.dll@-exit0@-stderr_mask@0xC@-msgbox_mask@12@-dumpcore_mask@0x7d@-staged@-code_api@-c@D:/derek/dr/git/build_x86_dbg_tests/suite/tests/bin/client.nudge_test.dll.dll@--@D:/derek/dr/git/build_x86_dbg_tests/suite/tests/bin/win32.infloop.exe" -D out=D:/derek/dr/git/build_x86_dbg_tests/suite/tests/client.nudge_test-out -D "nudge=-client;0;10" -D clear= -D pidfile=D:/derek/dr/git/build_x86_dbg_tests/suite/tests/client.nudge_test-pid -P D:/derek/dr/git/src/suite/tests/runall.cmake
Running command: |D:/derek/dr/git/build_x86_dbg_tests/bin32/run_in_bg;-out;D:/derek/dr/git/build_x86_dbg_tests/suite/tests/client.nudge_test-out;D:/derek/dr/git/build_x86_dbg_tests/bin32/drrun.exe;-pidfile;D:/derek/dr/git/build_x86_dbg_tests/suite/tests/client.nudge_test-pid;-quiet;-debug;-use_dll;D:/derek/dr/git/build_x86_dbg_tests/lib32/debug/dynamorio.dll;-exit0;-stderr_mask;0xC;-msgbox_mask;12;-dumpcore_mask;0x7d;-staged;-code_api;-c;D:/derek/dr/git/build_x86_dbg_tests/suite/tests/bin/client.nudge_test.dll.dll;--;D:/derek/dr/git/build_x86_dbg_tests/suite/tests/bin/win32.infloop.exe|
Command returned 0
Found D:/derek/dr/git/build_x86_dbg_tests/suite/tests/client.nudge_test-pid: pid is 15524
Found D:/derek/dr/git/build_x86_dbg_tests/suite/tests/client.nudge_test-out
Sending nudge -nudge_pid;15524;0;10
I once saw infloop hanging and not accepting the nudge:
0:000> ~*kn
. 0 Id: 3ca4.46dc Suspend: 1 Teb: 00dd0000 Unfrozen
# ChildEBP RetAddr
00 1eafe588 5af63824 ntdll!NtWaitForSingleObject+0xc
01 1eafe5a0 5af32dac dynamorio!nt_wait_event_with_timeout+0x14 [d:\derek\dr\git\src\core\win32\ntdll.c @ 3744]
02 1eafe84c 5af25d5d dynamorio!os_wait_event+0x25c [d:\derek\dr\git\src\core\win32\os.c @ 8462]
03 1eafe878 5ad5d37e dynamorio!mutex_wait_contended_lock+0xad [d:\derek\dr\git\src\core\win32\os.c @ 8570]
04 1eafe890 5ad5d1de dynamorio!d_r_mutex_lock_app+0x9e [d:\derek\dr\git\src\core\utils.c @ 891]
05 1eafe8a0 5ae12c41 dynamorio!d_r_mutex_lock+0xe [d:\derek\dr\git\src\core\utils.c @ 904]
06 1eafe900 5ae00b85 dynamorio!move_lazy_list_to_pending_delete+0xb1 [d:\derek\dr\git\src\core\vmareas.c @ 9183]
07 1eafe95c 5acb7c9e dynamorio!add_to_lazy_deletion_list+0x535 [d:\derek\dr\git\src\core\vmareas.c @ 9313]
08 1eafe9c0 5adee11d dynamorio!fragment_remove_shared_no_flush+0x69e [d:\derek\dr\git\src\core\fragment.c @ 3242]
09 1eafec10 5ade60dc dynamorio!end_and_emit_trace+0x287d [d:\derek\dr\git\src\core\monitor.c @ 1546]
0a 1eafef18 5ad3dab3 dynamorio!monitor_cache_enter+0x168c [d:\derek\dr\git\src\core\monitor.c @ 2069]
0b 1eafeff4 1eb2220e dynamorio!d_r_dispatch+0x383 [d:\derek\dr\git\src\core\dispatch.c @ 197]
WARNING: Frame IP not in any known module. Following frames may be wrong.
0c 00eff01c 00000000 0x1eb2220e
1 Id: 3ca4.3fb0 Suspend: 1 Teb: 00dd3000 Unfrozen
# ChildEBP RetAddr
00 0132fa88 77066359 dynamorio!interception_code_array+0x1266
01 0132fa98 77e77b74 KERNEL32!BaseThreadInitThunk+0x19
02 0132faf4 77e77b44 ntdll!__RtlUserThreadStart+0x2f
03 0132fb04 00000000 ntdll!_RtlUserThreadStart+0x1b
2 Id: 3ca4.3af4 Suspend: 1 Teb: 00dd6000 Unfrozen
# ChildEBP RetAddr
00 0142f99c 77066359 dynamorio!interception_code_array+0x1266
01 0142f9ac 77e77b74 KERNEL32!BaseThreadInitThunk+0x19
02 0142fa08 77e77b44 ntdll!__RtlUserThreadStart+0x2f
03 0142fa18 00000000 ntdll!_RtlUserThreadStart+0x1b
3 Id: 3ca4.37d8 Suspend: 1 Teb: 00dd9000 Unfrozen
# ChildEBP RetAddr
00 0152fbe8 77066359 dynamorio!interception_code_array+0x1266
01 0152fbf8 77e77b74 KERNEL32!BaseThreadInitThunk+0x19
02 0152fc54 77e77b44 ntdll!__RtlUserThreadStart+0x2f
03 0152fc64 00000000 ntdll!_RtlUserThreadStart+0x1b
4 Id: 3ca4.2e68 Suspend: 1 Teb: 00ddc000 Unfrozen
WARNING: Stack pointer is outside the normal stack bounds. Stack unwinding can be inaccurate.
# ChildEBP RetAddr
00 1ea7e720 5af63824 ntdll!NtWaitForSingleObject+0xc
01 1ea7e738 5af32dac dynamorio!nt_wait_event_with_timeout+0x14 [d:\derek\dr\git\src\core\win32\ntdll.c @ 3744]
02 1ea7e9e4 5af25d5d dynamorio!os_wait_event+0x25c [d:\derek\dr\git\src\core\win32\os.c @ 8462]
03 1ea7ea10 5ad5d37e dynamorio!mutex_wait_contended_lock+0xad [d:\derek\dr\git\src\core\win32\os.c @ 8570]
04 1ea7ea28 5ad5d1de dynamorio!d_r_mutex_lock_app+0x9e [d:\derek\dr\git\src\core\utils.c @ 891]
05 1ea7ea38 5ac9df04 dynamorio!d_r_mutex_lock+0xe [d:\derek\dr\git\src\core\utils.c @ 904]
06 1ea7eaa8 5af29e35 dynamorio!dynamo_thread_init+0x124 [d:\derek\dr\git\src\core\dynamo.c @ 2247]
07 1ea7ed50 5af5dae2 dynamorio!thread_attach_setup+0xe5 [d:\derek\dr\git\src\core\win32\os.c @ 2787]
08 1ea7ed64 5b0d5361 dynamorio!thread_attach_takeover_callee+0x12 [d:\derek\dr\git\src\core\win32\callback.c @ 8725]
09 1ea7ed6c 00000000 dynamorio!interception_code_array+0x1361
5 Id: 3ca4.3bbc Suspend: 1 Teb: 00ddf000 Unfrozen
# ChildEBP RetAddr
00 0325ec64 5af63824 ntdll!NtWaitForSingleObject+0xc
01 0325ec7c 5af32dac dynamorio!nt_wait_event_with_timeout+0x14 [d:\derek\dr\git\src\core\win32\ntdll.c @ 3744]
02 0325ef28 5af25d5d dynamorio!os_wait_event+0x25c [d:\derek\dr\git\src\core\win32\os.c @ 8462]
03 0325ef54 5ad5d37e dynamorio!mutex_wait_contended_lock+0xad [d:\derek\dr\git\src\core\win32\os.c @ 8570]
04 0325ef6c 5ad5d1de dynamorio!d_r_mutex_lock_app+0x9e [d:\derek\dr\git\src\core\utils.c @ 891]
05 0325ef7c 5ac9df04 dynamorio!d_r_mutex_lock+0xe [d:\derek\dr\git\src\core\utils.c @ 904]
06 0325efec 5af53ad6 dynamorio!dynamo_thread_init+0x124 [d:\derek\dr\git\src\core\dynamo.c @ 2247]
07 0325f370 5af542f2 dynamorio!intercept_new_thread+0x126 [d:\derek\dr\git\src\core\win32\callback.c @ 3139]
08 0325f38c 5b0d4cac dynamorio!intercept_ldr_init+0x82 [d:\derek\dr\git\src\core\win32\callback.c @ 3390]
09 00000000 00000000 dynamorio!interception_code_array+0xcac
0:000> !runaway
User Mode Time
Thread Time
0:46dc 0 days 0:00:00.562
5:3bbc 0 days 0:00:00.000
4:2e68 0 days 0:00:00.000
3:37d8 0 days 0:00:00.000
2:3af4 0 days 0:00:00.000
1:3fb0 0 days 0:00:00.000
0:000> g
Break-in sent, waiting 4 seconds...
WARNING: Break-in timed out, suspending.
This is usually caused by another thread holding the loader lock
(3ca4.46dc): Wake debugger - code 80000007 (first chance)
eax=00000000 ebx=1ead5040 ecx=00000001 edx=00000004 esi=00000000 edi=00000000
eip=77e81d9c esp=1eafe58c ebp=1eafe5a0 iopl=0 nv up ei pl nz na po nc
cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000202
ntdll!NtWaitForSingleObject+0xc:
77e81d9c c20c00 ret 0Ch
0:000> !runaway
User Mode Time
Thread Time
0:46dc 0 days 0:00:00.562
6:4510 0 days 0:00:00.000
5:3bbc 0 days 0:00:00.000
4:2e68 0 days 0:00:00.000
3:37d8 0 days 0:00:00.000
2:3af4 0 days 0:00:00.000
1:3fb0 0 days 0:00:00.000
0:000> ~*kn
. 0 Id: 3ca4.46dc Suspend: 1 Teb: 00dd0000 Unfrozen
# ChildEBP RetAddr
00 1eafe588 5af63824 ntdll!NtWaitForSingleObject+0xc
01 1eafe5a0 5af32dac dynamorio!nt_wait_event_with_timeout+0x14 [d:\derek\dr\git\src\core\win32\ntdll.c @ 3744]
02 1eafe84c 5af25d5d dynamorio!os_wait_event+0x25c [d:\derek\dr\git\src\core\win32\os.c @ 8462]
03 1eafe878 5ad5d37e dynamorio!mutex_wait_contended_lock+0xad [d:\derek\dr\git\src\core\win32\os.c @ 8570]
04 1eafe890 5ad5d1de dynamorio!d_r_mutex_lock_app+0x9e [d:\derek\dr\git\src\core\utils.c @ 891]
05 1eafe8a0 5ae12c41 dynamorio!d_r_mutex_lock+0xe [d:\derek\dr\git\src\core\utils.c @ 904]
06 1eafe900 5ae00b85 dynamorio!move_lazy_list_to_pending_delete+0xb1 [d:\derek\dr\git\src\core\vmareas.c @ 9183]
07 1eafe95c 5acb7c9e dynamorio!add_to_lazy_deletion_list+0x535 [d:\derek\dr\git\src\core\vmareas.c @ 9313]
08 1eafe9c0 5adee11d dynamorio!fragment_remove_shared_no_flush+0x69e [d:\derek\dr\git\src\core\fragment.c @ 3242]
09 1eafec10 5ade60dc dynamorio!end_and_emit_trace+0x287d [d:\derek\dr\git\src\core\monitor.c @ 1546]
0a 1eafef18 5ad3dab3 dynamorio!monitor_cache_enter+0x168c [d:\derek\dr\git\src\core\monitor.c @ 2069]
0b 1eafeff4 1eb2220e dynamorio!d_r_dispatch+0x383 [d:\derek\dr\git\src\core\dispatch.c @ 197]
WARNING: Frame IP not in any known module. Following frames may be wrong.
0c 00eff01c 00000000 0x1eb2220e
1 Id: 3ca4.3fb0 Suspend: 1 Teb: 00dd3000 Unfrozen
# ChildEBP RetAddr
00 0132fa88 77066359 dynamorio!interception_code_array+0x1266
01 0132fa98 77e77b74 KERNEL32!BaseThreadInitThunk+0x19
02 0132faf4 77e77b44 ntdll!__RtlUserThreadStart+0x2f
03 0132fb04 00000000 ntdll!_RtlUserThreadStart+0x1b
2 Id: 3ca4.3af4 Suspend: 1 Teb: 00dd6000 Unfrozen
# ChildEBP RetAddr
00 0142f99c 77066359 dynamorio!interception_code_array+0x1266
01 0142f9ac 77e77b74 KERNEL32!BaseThreadInitThunk+0x19
02 0142fa08 77e77b44 ntdll!__RtlUserThreadStart+0x2f
03 0142fa18 00000000 ntdll!_RtlUserThreadStart+0x1b
3 Id: 3ca4.37d8 Suspend: 1 Teb: 00dd9000 Unfrozen
# ChildEBP RetAddr
00 0152fbe8 77066359 dynamorio!interception_code_array+0x1266
01 0152fbf8 77e77b74 KERNEL32!BaseThreadInitThunk+0x19
02 0152fc54 77e77b44 ntdll!__RtlUserThreadStart+0x2f
03 0152fc64 00000000 ntdll!_RtlUserThreadStart+0x1b
4 Id: 3ca4.2e68 Suspend: 1 Teb: 00ddc000 Unfrozen
WARNING: Stack pointer is outside the normal stack bounds. Stack unwinding can be inaccurate.
# ChildEBP RetAddr
00 1ea7e720 5af63824 ntdll!NtWaitForSingleObject+0xc
01 1ea7e738 5af32dac dynamorio!nt_wait_event_with_timeout+0x14 [d:\derek\dr\git\src\core\win32\ntdll.c @ 3744]
02 1ea7e9e4 5af25d5d dynamorio!os_wait_event+0x25c [d:\derek\dr\git\src\core\win32\os.c @ 8462]
03 1ea7ea10 5ad5d37e dynamorio!mutex_wait_contended_lock+0xad [d:\derek\dr\git\src\core\win32\os.c @ 8570]
04 1ea7ea28 5ad5d1de dynamorio!d_r_mutex_lock_app+0x9e [d:\derek\dr\git\src\core\utils.c @ 891]
05 1ea7ea38 5ac9df04 dynamorio!d_r_mutex_lock+0xe [d:\derek\dr\git\src\core\utils.c @ 904]
06 1ea7eaa8 5af29e35 dynamorio!dynamo_thread_init+0x124 [d:\derek\dr\git\src\core\dynamo.c @ 2247]
07 1ea7ed50 5af5dae2 dynamorio!thread_attach_setup+0xe5 [d:\derek\dr\git\src\core\win32\os.c @ 2787]
08 1ea7ed64 5b0d5361 dynamorio!thread_attach_takeover_callee+0x12 [d:\derek\dr\git\src\core\win32\callback.c @ 8725]
09 1ea7ed6c 00000000 dynamorio!interception_code_array+0x1361
5 Id: 3ca4.3bbc Suspend: 1 Teb: 00ddf000 Unfrozen
# ChildEBP RetAddr
00 0325ec64 5af63824 ntdll!NtWaitForSingleObject+0xc
01 0325ec7c 5af32dac dynamorio!nt_wait_event_with_timeout+0x14 [d:\derek\dr\git\src\core\win32\ntdll.c @ 3744]
02 0325ef28 5af25d5d dynamorio!os_wait_event+0x25c [d:\derek\dr\git\src\core\win32\os.c @ 8462]
03 0325ef54 5ad5d37e dynamorio!mutex_wait_contended_lock+0xad [d:\derek\dr\git\src\core\win32\os.c @ 8570]
04 0325ef6c 5ad5d1de dynamorio!d_r_mutex_lock_app+0x9e [d:\derek\dr\git\src\core\utils.c @ 891]
05 0325ef7c 5ac9df04 dynamorio!d_r_mutex_lock+0xe [d:\derek\dr\git\src\core\utils.c @ 904]
06 0325efec 5af53ad6 dynamorio!dynamo_thread_init+0x124 [d:\derek\dr\git\src\core\dynamo.c @ 2247]
07 0325f370 5af542f2 dynamorio!intercept_new_thread+0x126 [d:\derek\dr\git\src\core\win32\callback.c @ 3139]
08 0325f38c 5b0d4cac dynamorio!intercept_ldr_init+0x82 [d:\derek\dr\git\src\core\win32\callback.c @ 3390]
09 00000000 00000000 dynamorio!interception_code_array+0xcac
6 Id: 3ca4.4510 Suspend: 1 Teb: 00de2000 Unfrozen
# ChildEBP RetAddr
00 0335ed24 5af63824 ntdll!NtWaitForSingleObject+0xc
01 0335ed3c 5af32dac dynamorio!nt_wait_event_with_timeout+0x14 [d:\derek\dr\git\src\core\win32\ntdll.c @ 3744]
02 0335efe8 5af25d5d dynamorio!os_wait_event+0x25c [d:\derek\dr\git\src\core\win32\os.c @ 8462]
03 0335f014 5ad5d37e dynamorio!mutex_wait_contended_lock+0xad [d:\derek\dr\git\src\core\win32\os.c @ 8570]
04 0335f02c 5ad5d1de dynamorio!d_r_mutex_lock_app+0x9e [d:\derek\dr\git\src\core\utils.c @ 891]
05 0335f03c 5ac9df04 dynamorio!d_r_mutex_lock+0xe [d:\derek\dr\git\src\core\utils.c @ 904]
06 0335f0ac 5af53ad6 dynamorio!dynamo_thread_init+0x124 [d:\derek\dr\git\src\core\dynamo.c @ 2247]
07 0335f430 5af542f2 dynamorio!intercept_new_thread+0x126 [d:\derek\dr\git\src\core\win32\callback.c @ 3139]
08 0335f44c 5b0d4cac dynamorio!intercept_ldr_init+0x82 [d:\derek\dr\git\src\core\win32\callback.c @ 3390]
09 00000000 00000000 dynamorio!interception_code_array+0xcac
0:000> dt thread_initexit_lock
dynamorio!thread_initexit_lock
+0x000 lock_requests : 0n4
+0x004 contended_event : 0x00000128 Void
+0x008 name : 0x5afb84b8 "thread_initexit_lock(mutex)@D:\derek\dr\git\src\core\dynamo.c:276"
+0x00c rank : 4
+0x010 owner : 0
+0x014 owning_dcontext : (null)
+0x018 prev_owned_lock : (null)
+0x01c count_times_acquired : 2
+0x020 count_times_contended : 0xa
+0x024 count_times_spin_pause : 0
+0x028 max_contended_requests : 0
+0x02c count_times_spin_only : 3
+0x030 prev_process_lock : 0x5b0e7158 _mutex_t
+0x034 next_process_lock : 0x5b0e70b8 _mutex_t
+0x038 callstack : [4] (null)
+0x048 app_lock : 0 ''
+0x049 deleted : 0 ''
So no owner but lock_requests==4.
The interception_code_array+0x1266 is the start of the sequence for thread_attach_takeover_callee. There is a loop later for initstack, but this is the start. Why aren't these threads being scheduled? They must be the 3 we see at the end ("<Failed to suspend attached-but-never-scheduled thread 12244>"): special system threads.