Re: [HTCondor-devel] Patch for drmaa-1.6.1


Date: Mon, 16 Jun 2014 21:44:56 +0300
From: Mikko Vainio <mikko.vainio@xxxxxx>
Subject: Re: [HTCondor-devel] Patch for drmaa-1.6.1
On 06/16/2014 08:38 PM, Jaime Frey wrote:
On Jun 13, 2014, at 7:12 AM, Mikko Vainio <mikko.vainio@xxxxxx> wrote:

Please find attached a patch of changes I had to make to file libDrmaa.c of drmaa-1.6.1 C-source code in order to get it play nice with drmaa-python 0.7.6 on 64-bit Windows 7.

A short summary of changes:
- An offset of 200 (STAT_NOR_BASE) is added to the status code of drmaa_wait() on normal job termination (see also file WISDOM), but that offset was not accounted for in functions drmaa_wtermsig and drmaa_wcoredump. These functions returned DRMAA_ERRNO_INVALID_ARGUMENT for a stat value of 200 (= normal termination, 0 + 200).
- The minimum accepted signal buffer size was 100 while drmaa-python has buffer size 32 (I assumed DRMAA_SIGNAL_BUFFER as defined in drmaa.h:52 is the correct value).


Could someone please confirm that these changes are correct?

The second change looks good.
But I don’t see the reason for the first change. As described in the man pages, drmaa_wtermsig() and drmaa_wcoredump() shouldn’t be called for a job that exited normally. They should only be called if the job exited via a signal (i.e. if drmaa_wifsignaled() set its first argument to non-zero). Returning DRMAA_ERRNO_INVALID_ARGUMENT for a normal termination status sounds like the right behavior to me.

If drmaa-python is expecting these functions to return success when called with a normal job termination status, that sounds like a bug in drmaa-python.

Thanks and regards,
Jaime Frey
UW-Madison HTCondor Project


drmaa-python calls all the stat interpreter functions, around here:
https://github.com/drmaa-python/drmaa-python/blob/master/drmaa/session.py#L480
Apparently they only tested against  SGE's implementation of DRMAA bindings, where they interpreted the C interface description document (http://redmine.ogf.org/attachments/100/drmaav1-c-binding.pdf) differently. For drmaa_wcoredump(), the argument description in that document says: "stat – The status code of a finished job." Here, a stat value of 200 is of a finished job. The return code description says: "DRMAA_ERRNO_INVALID_ARGUMENT – an argument value is invalid." In my opinion the argument value is valid.
The man page text seems to refer to what to fill in the core_dumped argument.

Workaround could be to use drmaa.Session().synchronize(...) instead of drmaa.Session().wait(...) in Python.

Best regards,
Mikko
[← Prev in Thread] Current Thread [Next in Thread→]