>
Submitting a job
via the python bindings is really not for the faint of heart Sure :-) The main thing I want is to submit a DAG and then accurately capture the cluster ID for future processing. So I could spawn condor_submit_dag and parse the results; this just seems a pretty ugly thing to do if there's an API sitting there waiting to be used, given that I'm writing stuff in Python anyway. That's unless condor_submit_dag has a machine-parseable output format (e.g. XML?) I don't think it does. > Oddly enough, I have a project which does precisely this. The code which generates the classad is here: >
>
> Note that I actually submit a wrapper script that does
a few devious things before exec'ing condor_dagman.
I can see two submission approaches in that code: (1) submitDirect() --> schedd.submit(dagAd, ...) (2) schedd.submitRaw( ... jdl ... ) where jdl appears to be the contents of a regular submit text file, but dagAd is a constructed classad structure. The decision about which approach to use is based on loc.getScheddObj(task['tm_taskname']), and I'm not sure what this is deciding; perhaps it's to do with whether the schedd is running on the same host or a remote host? But I don't understand why the same API can't be used in both cases. > Looks like you've run afoul of HTCondor's default
Requirements _expression_. Try this:
That works. That is an ugly, ugly, ugly hack. Somebody should be ashamed :-) I can see (from condor_q -long) that the requirements _expression_ expands to: Requirements = true || false && TARGET.OPSYS == "LINUX" && TARGET.ARCH == "X86_64" && ( TARGET.HasFileTransfer || ( TARGET.FileSystemDomain == MY.FileSystemDomain ) ) && TARGET.Disk >= RequestDisk && TARGET.Memory >= RequestMemory Basically, this relies on the fact that somebody forgot to add parentheses around the existing value when appending conditions in condor_submit. This could clearly break in future. Then after adding that, I find that -CsdVersion is a mandatory argument to condor_dagman; and then the submission works. Below is the python code as-is. I have still not worked out how to replace +OtherJobRequirements = "DAGManJobId == $(cluster)" in classad format; the presentation says to use strcat() but I couldn't get it to work. I am starting to conclude the python API is (a) not sufficiently documented, and (b) is different from the built-in tools in subtle and confusing ways (e.g. the Requirements hack is not required when you use condor_submit to submit a scheduler universe job) Regards, Brian. #!/usr/bin/env python from __future__ import print_function import htcondor, classad import os, sys DAGMAN="/usr/bin/condor_dagman" dag = sys.argv[1] os.stat(dag) # test for existence schedd = htcondor.Schedd() ad = classad.ClassAd({ "JobUniverse": 7, "Cmd": DAGMAN, "Arguments": "-f -l . -Lockfile %s.lock -AutoRescue 1 -DoRescueFrom 0 " \ "-Dag %s -Suppress_notification -CsdVersion '%s' -Force -Dagman %s" % (dag, dag, htcondor.version(), DAGMAN), "Env": "_CONDOR_MAX_DAGMAN_LOG=0;_CONDOR_DAGMAN_LOG=%s.dagman.out;" \ "_CONDOR_SCHEDD_DAEMON_AD_FILE=%s;_CONDOR_SCHEDD_ADDRESS_FILE=%s" % (dag, htcondor.param["SCHEDD_DAEMON_AD_FILE"], htcondor.param["SCHEDD_ADDRESS_FILE"]), "EnvDelim": ";", "Out": "%s.lib.out" % dag, "Err": "%s.lib.err" % dag, "ShouldTransferFiles": "IF_NEEDED", "UserLog": os.path.abspath("%s.dagman.log" % dag), "KillSig": "SIGTERM", "RemoveKillSig": "SIGUSR1", #"OtherJobRemoveRequirements": classad.ExprTree('eval(strcat("DAGManJobId == ", ClusterId))'), "OnExitRemove": classad.ExprTree('( ExitSignal =?= 11 || ( ExitCode =!= undefined && ExitCode >= 0 && ExitCode <= 2 ) )'), "FileSystemDomain": htcondor.param['FILESYSTEM_DOMAIN'], "Requirements": classad.ExprTree('true || false'), }) cluster = schedd.submit(ad) print("Submitted as cluster %d" % cluster) |