On Fri, 24 May 2013, Bill Williams wrote:
<snipping>
Yes, if Dyninst were to provide argument access, we would need to
support that level of semantic information; I definitely agree that
this would be a big task. That added complexity is a large part of
the reason argument access is a hypothetical future feature and not
intended to be part of this initial interface and implementation.
A couple of things. First, this functionality probably belongs in a
"value added" library that sits on top of Dyninst or its toolkits. It
doesn't need to go inside.
Second, the semantics of the argument types should be defined by the
syscall number. It's a bit of work, but mostly one-time work to
define a table of argument number and types for each syscall. And
there would be table version for each platform version of the
library. An interesting question is whether there is a DWARF-ful
version of libc so that we could build this table automatically? (Or
could we just trigger our own build of libc?)
Or perhaps we can auto-generate it from the kernel source? That would
allow us (for Linux) to auto-pull kernels from kernel.org and generate
the configuration files.
With respect to argument access, I would prefer, if possible, to find a way
to pull information that Symtab can already parse in order to build the
table(s) of arguments. Autogenerating from kernel source is a perfectly valid
backup plan...though at that point it might be better to build a kernel with
DWARF.
My thinking at the implementation level is that *if* we can turn Symtab loose
on this problem somehow, then it becomes a (comparatively) tractable problem
of tracking the differences between the syscall ABI and the regular callsite
ABI and storing the parameter types. That would allow us to treat syscalls as
"just like regular calls, but with a possibly different ABI and no ability to
instrument inside."
I don't think it would be too hard to extract this info from kernel
source. There's already a central header file that contains every system
call and their arguments:
https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git/tree/include/linux/syscalls.h
One could run some text processing to turn the syscalls.h function
declarations into table entries. Then turn the argument types into a
Dyninst compatible form, which would vary in difficultly depending on how
detailed you want the type info. It wouldn't be hard to turn them into
ints, longs, pointers and strings. More difficult if you wanted to
represent the contents of structs (which might be easier to extract from
DWARF, as Bill suggested).
Another source of this data is the strace tool. I just checked and strace
is released with a BSD license, which means Dyninst could incorporate its
source. Strace's system call table for linux/x86_64 can be found here:
http://strace.git.sourceforge.net/git/gitweb.cgi?p=strace/strace;a=blob;f=linux/x86_64/syscallent.h
It looks like this table contains the number of arguments to each syscall,
a general classification (network call, io call, ...), the system call
number, and a string name. Unfortunately, it doesn't include syscall
argument types.
-Matt
|