Multi-threading failure (HANG) on AArch64. ASSERT in utils.c: !lock->owner
We're seeing intermittent hangs with an AArch64 guest binary compiled with OpenMP. All the indications are that it is a multi-threading bug in DR. The hang happens on a DR release build. With the DEBUG build, the following assert fires and exits so doesn't get as far as hanging:
<Application /path/to/test_case.exe (44407). Internal Error: DynamoRIO debug check failure: /path/to/dynamorio/core/utils.c:576 !lock->owner
Which happens in:
static void
deadlock_avoidance_lock(mutex_t *lock, bool acquired, bool ownable)
{
if (acquired) {
. . .
if (ownable) {
ASSERT(!lock->owner);
lock->owner = d_r_get_thread_id();
lock->owning_dcontext = get_thread_private_dcontext();
}
. . .
The guest binary is built with armclang and linked to the Arm Performance Libraries on RHEL7.5:
armclang -fopenmp -armpl=lp64,parallel test_case.c -o test_case.exe
It fails without clients:
drrun ./test_case.exe
These also appear during the -debug
run:
<get_memory_info mismatch! (can happen if os combines entries in /proc/pid/maps)
os says: 0x0000fffde4000000-0x0000fffe0c000000 prot=0x00000000
cache says: 0x0000fffde4000000-0x0000fffe08000000 prot=0x00000000`
<ran out of stolen fd space>
It takes between 3 and 60 runs to get the assert to fire and only seems to fail on ThunderX2 machines.
Running with -loglevel 3
gives the following thread statistics:
(Begin) Thread statistics @6735 global, 0 thread fragments (0:05.953):
BB fragments targeted by IBL (thread): 3
Fcache exits, total (thread): 4
Fcache exits, from BBs (thread): 4
Fcache exits, total indirect branches (thread): 3
Fcache exits, non-trace indirect branches (thread): 3
Fcache exits, ind target in cache but not table (thread): 3
Fcache exits, from BB, ind target ... (thread): 3
Fcache exits, BB->BB, ind target ... (thread): 3
Fcache exits, dir target not in cache (thread): 1
Special heap units (thread): 1
Peak special heap units (thread): 1
Current special heap capacity (bytes) (thread): 65536
Peak special heap capacity (bytes) (thread): 65536
Heap headers (bytes) (thread): 56
Heap align space (bytes) (thread): 12
Peak heap align space (bytes) (thread): 12
Heap bucket pad space (bytes) (thread): 1136
Peak heap bucket pad space (bytes) (thread): 1136
Heap allocs in buckets (thread): 15
Heap allocs variable-sized (thread): 7
Total reserved memory (thread): 393216
Peak total reserved memory (thread): 393216
Guard pages, reserved virtual pages (thread): 4
Peak guard pages, reserved virtual pages (thread): 4
Current stack capacity (bytes) (thread): 65536
Peak stack capacity (bytes) (thread): 65536
Heap claimed (bytes) (thread): 17192
Peak heap claimed (bytes) (thread): 17192
Current heap capacity (bytes) (thread): 65536
Peak heap capacity (bytes) (thread): 65536
Current total memory from OS (bytes) (thread): 393216
Peak total memory from OS (bytes) (thread): 393216
Current vmm blocks for stack (thread): 3
Peak vmm blocks for stack (thread): 3
Current vmm blocks for special heap (thread): 3
Peak vmm blocks for special heap (thread): 3
Our virtual memory blocks in use (thread): 6
Peak our virtual memory blocks in use (thread): 6
Allocations using multiple vmm blocks (thread): 2
Blocks used for multi-block allocs (thread): 6
Current vmm virtual memory in use (bytes) (thread): 393216
Peak vmm virtual memory in use (bytes) (thread): 393216
Number of safe reads (thread): 17
(End) Thread statistics
Does anything look unusual?
There's lots of other thread related tracing in the logs but I don't know what to look for.
Clearly ownable
and lock->owner
are contradicting each other.
Where and when could that be happening?
Thanks