Hi all,
I am trying to understand several jobs, that entered our CondorCE
and went into hold, when their submission to Condor failed due to
the Out & Err files missing(?)
For example CondorCE job 406446.0 that got in principle routed to
422981.0 [1]. The CE-job's spool directory exists [2].
However, the resulting Condor job fails during submission, when
the stderr and stdout filers are tried to be opened for reading as
far as I see [3]
I am not sure, if the OPut & Err file make much sense in this
stage for a job (and if one could -as a quick fix- replace them in
a route by /dev/null or so)?
Cheers,
Thomas
[1]
ClusterId = 406446
UserLog = "406446.0.log"
GlobalJobId = "grid-htcondorce0.desy.de#406446.0#1614750664"
Environment = "HTCONDOR_JOBID=406446.0"
Iwd =
"/var/lib/condor-ce/spool/6446/0/cluster406446.proc0.subproc0"
RoutedToJobId = "422981.0"
Out = "406446.0.out"
Err = "406446.0.err"
[2]
> ls
/var/lib/condor-ce/spool/6446/0/cluster406446.proc0.subproc0
406446.0.log DIRAC_nd5lYU_pilotwrapper.py tmpBU9zHQ
[3]
03/04/21 13:25:04 (cid:107) Transferring files for jobs 406446.0
03/04/21 13:25:04 (cid:107) spoolJobFiles(): started worker
process
03/04/21 13:25:04 The submitting job ad as the FileTransferObject
sees it
...
xcount = 1
03/04/21 13:25:04 ReliSock::put_file_with_permissions(): Failed to
stat file
'/var/lib/condor-ce/spool/6446/0/cluster406446.proc0.subproc0/406446.0.err':
No such file or directory (errno: 2, si_error: 1)
03/04/21 13:25:04 ReliSock::put_file_with_permissions(): Failed to
stat file
'/var/lib/condor-ce/spool/6446/0/cluster406446.proc0.subproc0/406446.0.out':
No such file or directory (errno: 2, si_error: 1)
03/04/21 13:25:04 DoUpload: (Condor error code 13, subcode 2)
SCHEDD at 131.169.223.129 failed to send file(s) to
<202.13.206.84:37118>: error reading from
/var/lib/condor-ce/spool/6446/0/cluster406446.proc0.subproc0/406446.0.err:
(errno 2) No such file or directory; TOOL failed to receive
file(s) from <131.169.223.129:9619>
03/04/21 13:25:04 (cid:107) generalJobFilesWorkerThread(): failed
to transfer files for job 406446.0
03/04/21 13:25:04 condor_write(): Socket closed when trying to
write 29 bytes to <202.13.206.84:37118>, fd is 21
03/04/21 13:25:04 Buf::write(): condor_write() failed
03/04/21 13:25:04 ERROR - Staging of job files failed!
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/