Adds load-acquire semantics to a number of bare loads (i.e., not inside a mutex) in DR's various lock routines.
This includes a key missing load-acquire in the spinlock loop in d_r_mutex_lock_app() which might help explain some performance issues such as #4279.
Since some of these are in the utils.h header, and we have ordering dependencies between utils.h and arch_exports.h, this required splitting the atomic defines out of arch/arch_exports.h into a new file arch/atomic_exports.h. Since those atomic defines use ASSERT which references dynamo_options, this further required splitting the option struct out of options.h into options_struct.h.
Issue: #2502, #4928 (closed), #4279