api.rseq flakiness on glibc 2.35 A64 due to too long run times
The api.rseq
test has widely varying run times on A64. On the Jenkins machine (which currently has libc 2.31)
Start 307: code_api|api.rseq
Test #307: code_api|api.rseq ................ Passed 3.06 sec
Start 307: code_api|api.rseq
Test #307: code_api|api.rseq ................ Passed 2.56 sec
Start 307: code_api|api.rseq
Test #307: code_api|api.rseq ................ Passed 6.20 sec
Start 307: code_api|api.rseq
Test #307: code_api|api.rseq ................ Passed 23.19 sec
Start 307: code_api|api.rseq
Test #307: code_api|api.rseq ................ Passed 20.61 sec
Start 307: code_api|api.rseq
Test #307: code_api|api.rseq ................ Passed 19.14 sec
Start 307: code_api|api.rseq
Test #307: code_api|api.rseq ................ Passed 0.30 sec
Start 307: code_api|api.rseq
Test #307: code_api|api.rseq ................ Passed 8.32 sec
Start 307: code_api|api.rseq
Test #307: code_api|api.rseq ................ Passed 4.98 sec
Start 307: code_api|api.rseq
1/1 Test #307: code_api|api.rseq ................ Passed 9.42 sec
This test traces rseq execution using drmemtrace. The outputs are also really big (and of widely varying sizes):
3.1G suite/tests/drmemtrace.api.rseq.3835685.2325.dir
2.6G suite/tests/drmemtrace.api.rseq.3835688.0577.dir
5.7G suite/tests/drmemtrace.api.rseq.3835691.6161.dir
22G suite/tests/drmemtrace.api.rseq.3835694.4192.dir
20G suite/tests/drmemtrace.api.rseq.3835697.1669.dir
20G suite/tests/drmemtrace.api.rseq.3835700.9167.dir
282M suite/tests/drmemtrace.api.rseq.3835703.5354.dir
8.0G suite/tests/drmemtrace.api.rseq.3835706.8330.dir
5.1G suite/tests/drmemtrace.api.rseq.3835709.4015.dir
9.7G suite/tests/drmemtrace.api.rseq.3835712.2611.dir
This test launches a parallel thread which performs rseq_thread_loop
(looping over a region over and over until the main thread signals it to exit), while the main thread continues other unit tests: https://github.com/DynamoRIO/dynamorio/blob/056eca719cd4710c044890cc4a4695b6b1326e5d/suite/tests/linux/rseq.c#L988
When this thread is under DR, it seems to be taking too long to come out of the loop, possibly because of scheduling, which leads to lots of trace data being written before it actually exits.
I see the same variability with the #5711 changes.
On an A64 machine with newer glibc (I disabled rseq support for this experiment): it still happens. It looks to be a little worse (could be because I'm actually using a VM), even hitting the 90s timeout sometimes.
This doesn't happen on x86, where I see it take <0.4s, and <200KiB of disk always.