APP CRASH (c6474fda mpirun) drcachesim failed to write trace in offline mode
Created by: varun729
Describe the bug drcachesim failed to write trace in offline mode. When running drcachesim without "-offline", application runs fine and dumps the cache simulation results.
I have a patch (see Additional Context) that solves this problem. I don't know if it is the correct solution, so I have included it.
To Reproduce Steps to reproduce the behavior:
- Minimal source code: See the test_mpi.c application on https://usc-rc.github.io/tutorials/open-mpi
- Compiled MPICH v3.0.4 from source (./configure; make; make install)repository
- Compile and run
``` mpicc test_mpi.c -o test_mpi dynamorio/exports/bin64/drrun -t drcachesim -- mpirun -np 1 ./test_mpi # runs fine without -offline dynamorio/exports/bin64/drrun -t drcachesim -offline -- mpirun -np 1 ./test_mpi # failed to write trace in offline mode ```
Please also answer these questions:
- What happens when you run without any client?
- runs fine
- What happens when you run with debug build
- throws the error "Failed to create a subdir in ."
Expected behavior Application completes and generates traces in the application directory drmemory.test_mpi.id.num.dir (id is pid or tid, and num is a number < 10000)
Screenshots or Pasted Text Output (obfuscated the full path) with failure
``` WARNING: cannot find .../dynamorio/exports/bin64/../lib64/debug/libdynamorio.so: is this an incomplete installation? WARNING: .../dynamorio/exports/bin64/.. does not appear to be a valid DynamoRIO root Failed to create a subdir in . =================================================================================== = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES = EXIT CODE: 255 = CLEANING UP REMAINING PROCESSES = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES =================================================================================== ```
Versions
- What version of DynamoRIO are you using?
- latest c6474fda (need support for AVX512 so cannot use a released version)
- Does the latest build from https://github.com/DynamoRIO/dynamorio/releases solve the problem?
- no
- What operating system version are you running on? ("Windows 10" is not sufficient: give the release number.)
- Linux 5.4.0-72-generic (Ubuntu 18.04.5 LTS)
- Is your application 32-bit or 64-bit?
- 64-bit
Additional context Did some debugging on my own. Use the following patch to see that the application name contains the full path instead of only the basename of the executable.
Debugging patch:
``` diff --git a/clients/drcachesim/tracer/tracer.cpp b/clients/drcachesim/tracer/tracer.cpp index c7d02bee..39f1b4fd 100644 --- a/clients/drcachesim/tracer/tracer.cpp +++ b/clients/drcachesim/tracer/tracer.cpp @@ -1873,8 +1873,10 @@ init_offline_dir(void) if (file_ops_func.create_dir(buf)) break; } - if (i == NUM_OF_TRIES) + if (i == NUM_OF_TRIES) { + FATAL("Failed here 1 : %s : %d : %s : %s : %s\n", op_outdir.get_value().c_str(), dr_get_process_id(), subdir_prefix, dr_get_application_name(), buf); return false; + } /* We group the raw thread files in a further subdir to isolate from the * processed trace file. */ ```
Patch that resolves the problem:
diff --git a/core/unix/os.c b/core/unix/os.c index ec89c366..db9babb5 100644 --- a/core/unix/os.c +++ b/core/unix/os.c @@ -1094,7 +1094,7 @@ get_application_name(void) DYNAMORIO_EXPORT const char * get_application_short_name(void) { - return get_application_name_helper(false, false /* short name */); + return get_application_name_helper(true, false /* short name */); } /* Sets pointers to the application's command-line arguments. These pointers are then used
Output with patch:
``` WARNING: cannot find .../dynamorio/exports/bin64/../lib64/debug/libdynamorio.so: is this an incomplete installation? WARNING: .../dynamorio/exports/bin64/.. does not appear to be a valid DynamoRIO root Hello world from processor machinename, rank 0 out of 1 processors ```