We did some digging into the code, and it seems that the sub.queue() always does the most backward compatible thing, while condor_submit checks the version of the schedd you are trying to submit to and will use the most up-to-date features
that the schedd will support. This impacts file transfer, as well as the way the job’s environment is specified, and possibly a few other things. We have classified this as a bug and will fix it the next releases of 8.6 and 8.7 so the sub.queue() does the same version check that condor_submit does. -tj From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of Xin Wang Hi, John, Thank you for the message. This is helpful. With
condor_q -long, I can get the real config consumed by condor. As expected, they are slightly different, and some of the related difference is highlighted below: For the job that is submitted using
sub.queue(), here are some of the settings (which works but could not generate the right output and error files): Env = "PYTHONHOME=/my/path/to/anaconda3 " Out = "_condor_stdout" Err = "_condor_stderr" UserLog = "/tmp/test1.log" ShouldTransferFiles = "IF_NEEDED" TransferOutputRemaps = "_condor_stdout=/tmp/test1.out;_condor_stderr=/tmp/test1.err" For the job that is submitted using
schedd.submit(job_ad), here are some of the settings (which works correctly but requires setting LD_LIBRARY_PATH explicitly): Environment = "PYTHONHOME=/my/path/to/anaconda3 LD_LIBRARY_PATH=/my/path/to/anaconda3/lib" Out = "/tmp/test2.out" Err = "/tmp/test2.err" UserLog = "/tmp/test2.log" ShouldTransferFiles = "YES" So, schedd.submit(job_ad) is using default stdout and stderr and try to remap them back to the files, which obviously did not accomplish what it is meant for.
Other observations:
Any thought? Especially is there any fix so that I can get
sub.queue() work properly with the stdout and stderr files settings? Thank you. Xin From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of John M Knoeller Yes, condor_submit and sub.queue() do a great many things that schedd.submit() does not do. This is why the schedd.submit() method (and SOAP) was deprecated, because it requires you to do all of the things that sub.queue() does internally.
I don’t have any guesses why your output and error files are empty. I would suggest comparing the job ad you see from condor_q -long for a job that returned the correct output and a job that did not. If the failure to return output is
somehow a bug in HTCondor, it will almost certainly be triggered by some difference in those job ads. -tj From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of Xin Wang Hi, John, I tried your approach and use condor_submit -dump <dumpfile> to see the job classad for my submission file. It has ~80 lines, and most of them do not make any sense to me. I tried to add those extra settings
to my script but it did not help. The error when running schedd.submit(job_ad)
in my original script is below condor_exec.exe: error while loading shared libraries: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory which clearly indicates that something seems wrong with the environment and the condor cannot find the python3.6 shared libraries.
The strange thing is that I did set
PYTHONHOME in the environment, which is sufficient for the method of
condor_submit <submitfile> and the job submitted using
sub.queue() but not sufficient for schedd.submit(job_ad). To confirm my idea, when I updated the environment to
sub['environment'] = "PYTHONHOME=/my/path/to/anaconda3 LD_LIBRARY_PATH=/my/path/to/anaconda3/lib" , then my script works with schedd.submit(job_ad). Now the question is, does condor_submit and the job submitted using
sub.queue() do anything extra that
schedd.submt is not doing? For the job submitted using sub.queue(), I’m 100% sure that the job ran without issues, as I can see all results generated by my script. The only thing is that output and error files specified in the condor
config are not updated at all for the job. Thank you. Xin From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of John M Knoeller [External Message]
First of all, the job submitted using schedd.submit(job_ad) doesn’t run because the job ad is incomplete. When you use that method, you must fully specify the job classad,. To see what a fully specified job classad looks like, run condor_submit
-dump <submit_file> For the job submitted using sub.queue() – are you sure that the job ran and produced output? when the job is submitted, our output and error files will be created as 0 size files before the job ever runs. -tj From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx]
On Behalf Of Xin Wang I’m trying to submit jobs to condor to run some python scripts. If I generate a job file and submit with condor_submit, everything works fine. Here is the job file: universe = vanilla environment = "PYTHONHOME=/my/path/to/anaconda3" executable = /my/path/to/anaconda3/bin/python arguments = /my/path/to/scripts/myrun.py log = /tmp/job.log output = /tmp/test.log error = /tmp/test.err queue For the same job, I tried to submit through python bindings, using two different methods but do not have luck with either. Firstly I tried schedd.Submit with the following codes: import htcondor schedd = htcondor.Schedd() sub = htcondor.Submit() sub['universe'] = 'vanilla' sub['environment'] = "PYTHONHOME=/my/path/to/anaconda3" sub['executable'] = '/my/path/to/anaconda3/bin/python' sub['arguments'] = '/my/path/to/scripts/myrun.py' sub['log'] = '/tmp/job.log' sub['output'] = '/tmp/test.log' sub['error'] = '/tmp/test.err' with schedd.transaction() as txn: sub.queue(txn) The job was submitted without any issues, can run successfully without issues, and have log file /tmp/job.log generated successfully. However, output and error does not work, and /tmp/test.log or /tmp/test.err are generated but with size
0 (empty). Secondly, I tried schedd.submit with the following codes: import htcondor schedd = htcondor.Schedd() job_ad = { "cmd" : ‘/my/path/to/anaconda3/bin/python', "arguments" : '/my/path/to/scripts/myrun.py', 'env': "PYTHONHOME=/my/path/to/anaconda3", "log": '/tmp/job.log', "out": '/tmp/test.log', "err": "/tmp/test.err", } clusterId = schedd.submit(job_ad) The job could not run. However, /tmp/test.err can be generated proper error messages: condor_exec.exe: error while loading shared libraries: libpython3.6m.so.1.0: cannot open shared object file: No such file or directory I suspect that the error is because the environment is not properly set, but I had no luck when I also tried to set “environment” instead of “env”. How should I fix the settings so that I can submit condor task through python bindings properly? Thanks. Xin
|