[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Spawned/Forked process--Runaway



I believe the Condor team has worked on eliminating the issues related to Condor not being able to track forked processes on execute machines, but I seem to be having a similar issue when spawning processes with Python.

If I submit a job that runs a compiled python script and then remove this job via condor_rm <user>, a spawned process does not get removed from the execute machine. Because this is possible on cross-platfroms, this can occur on Windows or Linux (my pool is solely windows).

Has anyone else seen this? Obviously there are security ramifications to this, but it also causes other issues with file locks (if reading data from an NFS) and so forth. I write my programs so that the Condor job does not complete until the spawned process completes. I accomplish this by waiting for the spawned process to complete before moving to the next section of code that the submit file is executing on the execute machine. Once all the processing is completed, the jobs complete successfully. As long as the main program that executes on Condor (which is what Condor is tracking) finishes correctly, the spawned job is handled correctly. However, if I use condor_rm, the main program is removed but the spawned process is not.l

I am using subprocess.popen(), which can be referenced here:
http://docs.python.org/library/subprocess.html

So, first has anyone else run into this using python. Second, is there some way I can tell Condor to do specific things when a job is removed. I recall reading about a script that can be used on exits, but I could not find enough information to implement and I would need to look back over my notes for details.

thank you for your help and suggestions,
Michael