Uses pthread_key_create() to allocate enough contiguous and aligned TLS slots to fit our os_local_state_t struct. This makes it easier to share Linux code for Mac64.
Keeps the scheme from ce8e8035 of storing a pointer to the base of os_local_state_t in TLS slot 6. This is indirection we don't need with the entire os_local_state_t struct in TLS but it is not clear we can take that many TLS slots for large applications, so I'm leaving this mixture until we're sure which direction to go in.
Disables the options -mangle_app_seg and -safe_read_tls_init for Mac64.