Hi i need yours Helps.
I use Condor 7-0-2 version
I have problems in executing jobs which are submitting from a
submit or execute machine. The job start running but after same time there are stopping. In the local machine the execution is good but the problem appear with the remote submission. In addition when i submit jobs from my Manager the jobs are executed in remote machines!!!
My description file :Universe = vanilla
Executable =/home/condor/test
Arguments =15 10
Log =/home/condor/test.log
Output =/home/condor/test.out
Error =/home/condor/test.error
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
QueueMy log file is:
000 (012.000.000) 03/25 14:06:22 Job submitted from host: <
41.229.35.203:49377>
...
001 (012.000.000) 03/25 14:06:23 Job executing on host: <
41.229.35.204:45769>
...
022 (012.000.000) 03/25 14:06:38 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to slot2@Grid011 <
41.229.35.204:45769>
...
023 (012.000.000) 03/25 14:06:38 Job reconnected to slot2@Grid011
startd address: <
41.229.35.204:45769>
starter address: <
41.229.35.204:45308>
...
022 (012.000.000) 03/25 14:06:38 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to slot2@Grid011 <
41.229.35.204:45769>
...
023 (012.000.000) 03/25 14:06:38 Job reconnected to slot2@Grid011
startd address: <
41.229.35.204:45769>
starter address: <
41.229.35.204:45308>
...
Thank you.
walid SAAD