Dear experts,
We are using condor as our batch system, and met a prblem these days.
If we require file transfer in the condor script, then the jobs will failed with information:
007 (2885999.000.000) 10/14 11:49:04 Shadow exception!
Error from slot7@XXX: Could not initiate file transfer
if you use condor_q, you will see your jobs exchange between "Idle" and "Run" .
but if the condor jobs donnot require a file transfer, then if can run successfully.
I also tried to shutdown the firewall, and it doesn't really help.
Here is an example of our condor script:
Universe = vanilla
Notification = Never
GetEnv = True
Executable = /moose/AtlUser/liumh/CondorTest/Data/run_ana.sh
#Arguments = realdata_001_150501_0042004.txt
/moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/root/realdata_001_150501_0042004.root
realdata_001_150501_0042004.lst
Arguments =
realdata_001_150501_0042004.txt
/moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/root/realdata_001_150501_0042004.root
Output = /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/out/realdata_001_150501_0042004.out
Error = /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/err/realdata_001_150501_0042004.err
Log = /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/log/realdata_001_150501_0042004.log
+Group = "BESIII"
should_transfer_files= yes
#transfer_input_files =
/moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/opt/realdata_001_150501_0042004.txt,/moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/opt/realdata_001_150501_0042004.lst
transfer_input_files = /moose/AtlUser/liumh/CondorTest/Data/Root2125PiPTest/opt/realdata_001_150501_0042004.txt
requirements =
(substr(Machine,0,4)!="bl-0"&&ARCH=="X86_64")&& (machine
!= "bl-3-15.hep.ustc.edu.cn") && (machine !=
"bl-3-16.hep.ustc.edu.cn")
WhenToTransferOutput = ON_EXIT_OR_EVICT
OnExitRemove = TRUE
Queue
Would you please gave me some comments? Thanks a lot !
Best regards,