I've installed the last version of condor in my PC,
and it's running ok under linux (Redhat 9). The
problem appears when I sent a process to run, after
running for a few seconds a shadow exception appears,
and the the process starts again until another shadow
exception stops it. I dunno what's happening, if you
run the process in one pc wothout using condors works
perfect. Looking at the logs first I saw in the log of my
program this:
000 (003.000.000) 09/21 13:40:49 Job submitted from
host: <193.147.240.233:36284
... 001 (003.000.000) 09/21 13:40:52 Job executing on host: <193.147.240.233:36286> ... 006 (003.000.000) 09/21 13:41:00 Image size of job updated: 1332 ... 007 (003.000.000) 09/21 13:41:42 Shadow exception! Can no longer talk to condor_starter <193.147.240.233:36286> 0 - Run Bytes Sent By Job 14829 - Run Bytes Received By Job *****************************************************
So i looked in the Starterlog and this is what I've got:
Starterlog:
9/21 13:40:52 ****************************************************** 9/21 13:40:52 ** condor_starter (CONDOR_STARTER) STARTING UP 9/21 13:40:52 ** /home/condor/condor-6.7.1/sbin/condor_starter 9/21 13:40:52 ** $CondorVersion: 6.7.1 Aug 10 2004 $ 9/21 13:40:52 ** $CondorPlatform: I386-LINUX_RH9 $ 9/21 13:40:52 ** PID = 15935 9/21 13:40:52 ****************************************************** 9/21 13:40:52 Using config file: /home/condor/condor-6.7.1/etc/condor_config 9/21 13:40:52 Using local config files: /home/condor/condor-6.7.1/local.golem/c o ndor_config.local 9/21 13:40:52 DaemonCore: Command Socket at <193.147.240.233:36308> 9/21 13:40:52 Done setting resource limits 9/21 13:40:52 Communicating with shadow <193.147.240.233:36306> 9/21 13:40:52 Submitting machine is "golem.imim.es" 9/21 13:40:52 File transfer completed successfully. 9/21 13:40:52 Starting a VANILLA universe job with ID: 3.0 9/21 13:40:52 IWD: /home/condor/condor-6.7.1/local.golem/execute/dir_15935 9/21 13:40:52 Output file: /home/condor/condor-6.7.1/local.golem/execute/dir_15 935/2program.out 9/21 13:40:52 Error file: /home/condor/condor-6.7.1/local.golem/execute/dir_159 35/2program.err 9/21 13:40:52 About to exec /home/condor/condor-6.7.1/local.golem/execute/dir_1 5935/condor_exec.exe 9/21 13:40:52 Create_Process succeeded, pid=15937 9/21 13:41:42 Process exited, pid=15937, status=0 9/21 13:41:42 ReliSock: put_file: Failed to open file /home/condor/condor-6.7.1 /local.golem/execute/dir_15935/2program.log, errno = 2. 9/21 13:41:42 ERROR "DoUpload: Failed to send file /home/condor/condor-6.7.1/lo cal.golem/execute/dir_15935/2program.log, exiting at 1408 " at line 1407 in file file_transfer.C 9/21 13:41:42 ShutdownFast all jobs. *****************************************************
the Shadowlog : 9/21 13:40:52 ****************************************************** 9/21 13:40:52 ** condor_shadow (CONDOR_SHADOW) STARTING UP 9/21 13:40:52 ** /home/condor/condor-6.7.1/sbin/condor_shadow 9/21 13:40:52 ** $CondorVersion: 6.7.1 Aug 10 2004 $ 9/21 13:40:52 ** $CondorPlatform: I386-LINUX_RH9 $ 9/21 13:40:52 ** PID = 15934 9/21 13:40:52 ****************************************************** 9/21 13:40:52 Using config file: /home/condor/condor-6.7.1/etc/condor_config 9/21 13:40:52 Using local config files: /home/condor/condor-6.7.1/local.golem/c ondor_config.local 9/21 13:40:52 DaemonCore: Command Socket at <193.147.240.233:36306> 9/21 13:40:52 Initializing a VANILLA shadow for job 3.0 9/21 13:40:52 (3.0) (15934): Request to run on <193.147.240.233:36286> was ACCE PTED 9/21 13:41:42 (3.0) (15934): ERROR "Can no longer talk to condor_starter <193.1 47.240.233:36286>" at line 93 in file NTreceivers.C *********************
Anyone knows where the problem is?
BTW, I just have only one machine that everytime a process is send it starts running imediately. If you need more info let me know
______________________________________________
Renovamos el Correo Yahoo!: ¡100 MB GRATIS!
Nuevos servicios, más seguridad
http://correo.yahoo.es
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
http://lists.cs.wisc.edu/mailman/listinfo/condor-users