On 06/16/2014 08:38 PM, Jaime Frey
wrote:
On Jun 13, 2014, at 7:12 AM, Mikko Vainio <mikko.vainio@xxxxxx>
wrote:
Please find attached a
patch of changes I had to make to file libDrmaa.c of
drmaa-1.6.1 C-source code in order to get it play nice with
drmaa-python 0.7.6 on 64-bit Windows 7.
A short summary of
changes:
- An offset of 200
(STAT_NOR_BASE) is added to the status code of drmaa_wait()
on normal job termination (see also file WISDOM), but that
offset was not accounted for in functions drmaa_wtermsig and
drmaa_wcoredump. These functions returned
DRMAA_ERRNO_INVALID_ARGUMENT for a stat value of 200 (=
normal termination, 0 + 200).
- The minimum accepted
signal buffer size was 100 while drmaa-python has buffer
size 32 (I assumed DRMAA_SIGNAL_BUFFER as defined in
drmaa.h:52 is the correct value).
Could someone please
confirm that these changes are correct?
The second change looks good.
But I don’t see the reason for the first change. As described
in the man pages, drmaa_wtermsig() and drmaa_wcoredump()
shouldn’t be called for a job that exited normally. They should
only be called if the job exited via a signal (i.e. if
drmaa_wifsignaled() set its first argument to non-zero).
Returning DRMAA_ERRNO_INVALID_ARGUMENT for a normal termination
status sounds like the right behavior to me.
If drmaa-python is expecting these functions to return
success when called with a normal job termination status, that
sounds like a bug in drmaa-python.
drmaa-python calls all the stat interpreter functions, around here:
https://github.com/drmaa-python/drmaa-python/blob/master/drmaa/session.py#L480
Apparently they only tested against
SGE's implementation of DRMAA bindings, where they interpreted the C
interface description document
(
http://redmine.ogf.org/attachments/100/drmaav1-c-binding.pdf)
differently. For drmaa_wcoredump(), the argument description in that
document says: "stat – The status code of a finished job." Here, a
stat value of 200 is of a finished job. The return code description
says: "DRMAA_ERRNO_INVALID_ARGUMENT – an argument value is invalid."
In my opinion the argument value is valid.
The man page text seems to refer to what to fill in the core_dumped
argument.
Workaround could be to use drmaa.Session().synchronize(...) instead
of drmaa.Session().wait(...) in Python.