api.rseq flakiness on glibc 2.35 A64 due to too long run times

The api.rseq test has widely varying run times on A64. On the Jenkins machine (which currently has libc 2.31)

    Start 307: code_api|api.rseq
    Test #307: code_api|api.rseq ................   Passed    3.06 sec
    Start 307: code_api|api.rseq
    Test #307: code_api|api.rseq ................   Passed    2.56 sec
    Start 307: code_api|api.rseq
    Test #307: code_api|api.rseq ................   Passed    6.20 sec
    Start 307: code_api|api.rseq
    Test #307: code_api|api.rseq ................   Passed   23.19 sec
    Start 307: code_api|api.rseq
    Test #307: code_api|api.rseq ................   Passed   20.61 sec
    Start 307: code_api|api.rseq
    Test #307: code_api|api.rseq ................   Passed   19.14 sec
    Start 307: code_api|api.rseq
    Test #307: code_api|api.rseq ................   Passed    0.30 sec
    Start 307: code_api|api.rseq
    Test #307: code_api|api.rseq ................   Passed    8.32 sec
    Start 307: code_api|api.rseq
    Test #307: code_api|api.rseq ................   Passed    4.98 sec
    Start 307: code_api|api.rseq
1/1 Test #307: code_api|api.rseq ................   Passed    9.42 sec

This test traces rseq execution using drmemtrace. The outputs are also really big (and of widely varying sizes):

3.1G	suite/tests/drmemtrace.api.rseq.3835685.2325.dir
2.6G	suite/tests/drmemtrace.api.rseq.3835688.0577.dir
5.7G	suite/tests/drmemtrace.api.rseq.3835691.6161.dir
22G	suite/tests/drmemtrace.api.rseq.3835694.4192.dir
20G	suite/tests/drmemtrace.api.rseq.3835697.1669.dir
20G	suite/tests/drmemtrace.api.rseq.3835700.9167.dir
282M	suite/tests/drmemtrace.api.rseq.3835703.5354.dir
8.0G	suite/tests/drmemtrace.api.rseq.3835706.8330.dir
5.1G	suite/tests/drmemtrace.api.rseq.3835709.4015.dir
9.7G	suite/tests/drmemtrace.api.rseq.3835712.2611.dir

This test launches a parallel thread which performs rseq_thread_loop (looping over a region over and over until the main thread signals it to exit), while the main thread continues other unit tests: https://github.com/DynamoRIO/dynamorio/blob/056eca719cd4710c044890cc4a4695b6b1326e5d/suite/tests/linux/rseq.c#L988 When this thread is under DR, it seems to be taking too long to come out of the loop, possibly because of scheduling, which leads to lots of trace data being written before it actually exits.

I see the same variability with the #5711 changes.

On an A64 machine with newer glibc (I disabled rseq support for this experiment): it still happens. It looks to be a little worse (could be because I'm actually using a VM), even hitting the 90s timeout sometimes.

This doesn't happen on x86, where I see it take <0.4s, and <200KiB of disk always.