Hi,
I am a novel user of condor who is trying to get a Fortran program running. I’m running Personal Condor
and my own computer is the only one in the pool. Below I have put the
submission file, plus some typical repeating parts from the different log
files. The program runs perfect if I just run it directly. I have even removed
all output to the screen to see if it would help. Can anybody help me?
Thank you!
Steen Chirstesen, CAPEC, Institute of Chemical
Engineering, DTU, Denmark
Submission file:
#
#
Run fortran analysis program
using condor
#
universe =
vanilla
executable =
COM2RDF_v1_5Condor.exe
#nice_user = True
#input
files
transfer_input_files
= General.inf,md3.xst,B-MA_1-7_1K_3.com,B-MA_1-7_1K.typ
#output
files
transfer_output_files
= rdf_B-MA_1-7_1K_3.out,rdf_B-MA_1-7_1K_3V.out
error = rdftest.err
log = rdftest.log
output = rdftest.out
queue
Log file from my submission:
001 (007.000.000) 08/23 12:47:48 Job executing on host:
<192.38.89.192:1052>
...
007 (007.000.000) 08/23 12:47:49 Shadow exception!
Can
no longer talk to condor_starter on execute machine
(192.38.89.192)
0 - Run Bytes Sent By Job
575326720 - Run Bytes Received By Job
...
ShadowLog:
8/23
12:31:27
** condor_shadow (CONDOR_SHADOW) STARTING UP
8/23
12:31:27
** C:\Condor\bin\condor_shadow.exe
8/23
12:31:27
** $CondorVersion: 6.6.10 Jun 22 2005
$
8/23
12:31:27
** $CondorPlatform: INTEL-WINNT50 $
8/23
12:31:27
** PID = 3652
8/23
12:31:27
******************************************************
8/23
12:31:27
Using config file:
C:\Condor\condor_config
8/23
12:31:27
Using local config files:
C:\Condor/condor_config.local
8/23
12:31:27
DaemonCore: Command Socket at
<192.38.89.192:2062>
8/23
12:31:28
Initializing a VANILLA shadow
8/23
12:31:28
(7.0) (3652): Request to run on <192.38.89.192:1052> was ACCEPTED
8/23
12:32:54
(7.0) (3652): condor_read(): recv() returned -1, errno = 10054, assuming failure.
8/23
12:32:55
(7.0) (3652): DaemonCore: Can't receive command
request (perhaps a timeout?)
8/23
12:32:55
(7.0) (3652): condor_read(): recv() returned -1, errno = 10054, assuming failure.
8/23
12:32:55
(7.0) (3652): ERROR "Can no longer talk to condor_starter
on execute machine (192.38.89.192)" at line 63 in file
..\src\condor_shadow.V6.1\NTreceivers.C
ShedLog example:
8/23
12:32:56
Started shadow for job 7.0 on "<192.38.89.192:1052>", (shadow pid = 3512)
8/23
12:32:57
Sent ad to central manager for sch@xxxxxxxxxxxxxxxxxxxxxxx
8/23
12:34:21
DaemonCore: Command received via UDP from host
<192.38.89.192:2102>
8/23
12:34:21
DaemonCore: received command 60001 (DC_PROCESSEXIT),
calling handler (HandleProcessExitCommand())
8/23
12:34:21
Shadow pid 3512 for job 7.0
exited with status 4