CRASH (TOT suite/tests/common/decode) linux 32-bit test_modrm16
From derek.br...@gmail.com on April 02, 2009 11:26:52
this was PR 305335: linux CRASH suite/tests/common/decode 32-bit test_modrm16
we never had time to track it down before but now we think we know what's going on
pasting emails here that explain the bug and repercussions/solutions:
From: Derek Bruening
OK, let me know if I have this right: we end up executing from some data region used by libc's sscanf, which we thus make read-only for code cache consistency. But since we ourselves use libc's sscanf we end up tripping our own write watchpoint and can't make forward progress.
Wow, that's a cool bug. Nice job tracking it down.
Some thoughts:
- This is another reason to add to the list for not using user libraries: an interesting twist on the existing transparency dogma.
- All code DR uses should really be grouped and protected in the same manner. E.g., if we had our own copy of sscanf inside our own library, we would disallow the app from executing from it (we'd pretend it hit a fault) and would not have this bug. The problem is that we're running code from libc but we're not treating it as part of DR.
- sscanf in particular is problematic for portability: xref issue #36 and "__isoc99_sscanf@@GLIBC_2.7". If we can easily stop using sscanf it would solve multiple problems.
- xref issue #46 (closed) on eventually not using any of libc. Most of the heavyweight routines we use are only during init or exit. I was assuming that the string routines, which are used in fragile locations, are very clean and don't go writing to global data. That seems to not be the case for sscanf.
- xref PR 207635/3157 (not filed on Google yet): linux SIGSEGV should consider libc.so part of DR when assigning blame: related issue
- %gs is used for pthreads TLS. might be interesting to fully understand what this _IO_vfscanf code (if it's really in that routine at +1700) is doing and what that memory region is but might not be worth the time.
- I never figured out how to get gdb to display the base address of segments (I mention this in Debugging.wiki). This seems like a major missing feature so I hope there is a way and I've somehow overlooked it.
- It does seem like we should perhaps have a separate test that does this pathological transfer to a data section that we use, and make the decode test a little better behaved to focus on decoding: which is what you were suggesting earlier.
Note that one way to avoid using sscanf at runtime is to rely on the all_memory_areas list, which was supposed to replace the /proc/self/maps reading but has had some bugs/issues in the past where it gets out of sync with the maps file. Xref issue #91 (closed) (need to watch SYS_brk), PR 213256 (kernel merging regions => mismatch), and PR 246897 (where we switched back to using maps file on queries).
- Derek
On Thu, Apr 02, 2009 at 12:38:20AM -0400, Qin Zhao wrote:
in test case code, there is a list of code addr16 mov %gs:(%di),%esp ret
%gs: 0x33 (51) %edi: 0x8040008
after the move instruction, esp value is updated to 0xb7c356b0. and the target pc stored in (esp) is also 0xb7c356b0. Area b7c35000-b7c36000 is always the region right before libc
b7c35000-b7c36000 rw-p b7c35000 00:00 0 b7c36000-b7d8e000 r-xp 00000000 08:01 128430 /lib/tls/i686/cmov/libc-2.8.90.so b7d8e000-b7d90000 r--p 00158000 08:01 128430 /lib/tls/i686/cmov/libc-2.8.90.so b7d90000-b7d91000 rw-p 0015a000 08:01 128430 /lib/tls/i686/cmov/libc-2.8.90.so
Then when execute ret (actucally is the pop ecx, then search target code), code fragment start from 0xb7c356b0 will be constructed. In check_thread_vm_area, region b7c35000-b7c36000 will be added into executable area. The write permission will be disabled. b7c35000-b7c36000 r-xp b7c35000 00:00 0
The later in get_memory_info -> get_memory_info_from_os -> maps_iterator_next -> sscanf -> ... there is code 0xb7c85844 <_IO_vfscanf+1700>: mov %gs:(%esi),%ecx 0xb7c85847 <_IO_vfscanf+1703>: movl $0x0,%gs:(%esi)
%gs: 0x33 (51) %esi: 0xffffffdc (-36)
The write code cause the sigsegv again. And then later in signal handler, every attempt to use maps_iterator_next -> sscanf will cause the sigsegv, so the DR hangs.
I suspect %gs:(%esi) is pointing to the code in region b7c35000-b7c36000. so the read is ok but the write cause sigsegv. Because before the permission changes, the maps_iterator_next never cause sigsegv, and maps_iterator_next cause sigsegv right after permission changed.
Original issue: http://code.google.com/p/dynamorio/issues/detail?id=105