[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Submitting to a remote condor queue



On 08/02/2016 16:37, Todd Tannenbaum wrote:
Consider an alternative similar to the following:

Do a volume mount so that your container and your host share some subdirectory on the host file system. In this subdirectory, create a "runme" directory where your container will atomically write out DAG files along with their corresponding submit files. Meanwhile on the host schedd, have a local universe job (submitted by whatever user you choose) that peridoically scans the "runme" directory for .dag files, submits them, and then renames the submitted .dag file to .dag.submitted.jobX.Y.
This way you do not need to do any reconfiguration of your host 
schedd, you don't need to have any trust relationships between your 
container and your host schedd, and you don't need to pay any extra 
overhead of having HTCondor move files in and out of the container via 
file transfer.
Thank you - I was sort-of coming to that conclusion myself. Indeed, if 
the container only wants to fire-and-forget a single DAG, I can just run 
the container and wait for it to exit; if it exits with success (rc=0) 
then I pick up and submit the .dag file that it wrote. This does mean 
that the container is only responsible for the preparation of the job, 
not for managing its lifecycle. So for example, it can't do any 
post-processing actions when the DAG completes.
Alternatively, if I submit jobs in the way you suggest (by polling for 
drop files), then the container can carry on running and itself can 
check for DAG completion, e.g. by polling the node status file. It seems 
a pretty crude to use the filesystem in this way, instead of proper 
condor APIs, but it should be functional.
Either way, I need to allocate working directories outside the 
container, and have some external system which tracks which DAGs are 
running and deletes the working directories afterwards.
If the container itself were responsible for running the DAG then each 
container would *be* the working directory, and "docker ps" would be my 
list of running tasks. But having read a bit more about remote 
submission, I see there are a number of difficulties with this. Foremost 
seems to be that the jobs that condor_dagman submits would need to be 
able to write their outputs inside the container - which in turn I think 
means condor_dagman itself would have to run inside the container, and 
that would be a very non-standard way of deploying htcondor, unless you 
also had a schedd running inside the container.
Thanks again,

Brian.