| 
Hi Greg ! 
Thanks, here it is the log, according to it executable file has not been not copied into the docker image.
 
gergely.debreczeni@xxxxxxx:~/batchsubmission$ condor_q -anal 57.1 -- Schedd: X.X.X.X <10.1.8.8:51975?... --- 057.001:  Request is held. Hold reason: Error from slot1@scorpio005: STARTER at 10.1.10.5 failed to send file(s) to <10.1.8.8:28343>: error reading from /var/lib/condor/execute/dir_1810221/output.out: (errno 2) No such file or directory; SHADOW failed to receive file(s) from <10.1.10.5:10057> 
And the reason for this is that the executable was not running, the executable was not copied. The job's stderr message says: 
WARNING: Your kernel does not support swap limit capabilities or the cgroup is not mounted. Memory limited without swap. /usr/local/bin/nvidia_entrypoint.sh: line 88: exec: batch.sh: not found Experimenting with it a bit more, the executable only gets copied (with condor 8.4.2) if 
 So like this in the paramlist file: a, batch.sh, 1 2 a, batch.sh, 3 4 a, batch.sh, 5 6a, batch.sh, 7 8 and this the submission file: executable              = ./batch.sh universe                = docker docker_image            = nv-pytorch-wglobus_v2 ## Logs log                     = out/batch.$(Process).log output                  = out/batch.$(Process).stdout error                   = out/batch.$(Process).stderr ## File transfer should_transfer_files   = Yes when_to_transfer_output = ON_EXIT line = $(Row) transfer_output_files   = output.out transfer_output_remaps  = "output.out=out/output$INT(line).out" transfer_input_files    = $(input_file1), $(input_file2) ## Resources requested request_cpus            = 1 request_GPUs            = 0 Requirements            = (ResourceType == "Dedicated") && (regexp(".*nv-pytorch-wglobus_v2.*",LocallyAvailableDockerImages)) ## Submit command queue input_file1, input_file2, arguments from [0:2:1] ./paramlist 
with condor 8.8.0 it works also without the ./ and explicit listing in paramlist file. 
thanks, 
Gergely From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Greg Thain <gthain@xxxxxxxxxxx> Sent: Monday, May 6, 2019 4:07 PM To: htcondor-users@xxxxxxxxxxx Subject: Re: [HTCondor-users] batch submitssion strange problem On 5/4/19 3:00 PM, Gergely Debreczeni via HTCondor-users wrote: 
 Can you send us the output of condor_q -hold. When a job is held, condor_q -hold will show the hold reason, which is often the best way to debug what's going on. 
 -greg 
 This e-mail and any files transmitted with it contain confidential and may contain privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this e-mail. Any unauthorized use, copying, disclosure or distribution of the material in this e-mail is strictly forbidden. |