CRASH instrument_post_syscall() race w/ another thread
From derek.br...@gmail.com on March 11, 2010 21:59:29
one thread terminates the process, but in release build we don't synch w/ the other threads. for CLIENT_INTERFACE we go and call instrument_thread_exit() for the other threads, though, and can hit crashes like this one, where dcontext->client_data is set to NULL for another thread that then de-refs it. even for non-CI there can be races that result in crashes if other threads refer to global data structs that get cleaned up: though we don't do that much cleanup in release build.
There are two problems to be solved:
A) A thread can execute beyond its own exit event, causing correctness problems in clients
B) A thread can execute beyond its data structures or global data structures being freed, causing crashes
I took a multi-pronged approach to solving both:
-
if no thread exit event exists then A is not possible. avoid B by skipping instrument_thread_exit(). if other threads alive at process exit event, client should do synch when combining data. then we don't care what happens after exit event since client persistent data is all written out so who cares what other threads do between then and actual process death.
actually we do fragment_exit(), etc. before client exit event, so other threads can crash: the change for
#2
about not reporting such crashes will help. we could do the send-out-signals even if no thread exit event: not going to bother though, clients can ask for full synch if nec. -
the default option when a thread exit event exists is best-effort for both A and B but no guarantees for either:
- for A: attempt to prevent other threads from executing beyond their own exit events: on linux, send custom terminate signal (but don't wait for it), on Windows call NtTerminateProcess(0) we just live w/ risk of thread holding a lock since very few locks are used beyond this point.
- for B, the solution to A helps. I also made two further changes:
- don't free DR data in instrument_thread_exit() to reduce incidence of crashes
- do not report SIGSEGV/exception when dynamo_exited=true and multiple threads exist. this could potentially mask other bugs: we'll live with that. A change as part of implementing that last one on Windows:
- with -private_loader, ntdll hooks are not removed on release build exit so we will squelch exit-time crashes
-
-synch_at_exit option to guarantee a solution to both A and B. APi routine dr_request_synchronized_exit() also sets this. I documented this in the API docs. if client needs guarantee that thread exit event will never be followed by execution by that thread then it should set this option.
or should default be -synch_at_exit for simpler client usage model, and -opt_speed should turn it off?
Original issue: http://code.google.com/p/dynamorio/issues/detail?id=271