| Mailing List ArchivesAuthenticated access |  | ![[Computer Systems Lab]](http://www.cs.wisc.edu/pics/csl_logo.gif)  | 
 
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Transfer problem?
- Date: Wed, 21 Jul 2004 14:53:27 +0200
- From: "LOPEZ BARNES, GERAR" <glopez1@xxxxxxx>
- Subject: [Condor-users] Transfer problem?
 
We have a problem when running condor, that we can't figure out how to
solve it. When sending a new job to the queue, we can see that the job
runs for a short period of time and then it goes to an idle status.
Cheking out the logs files we have found this:
Starterlog.vm2:
7/21 14:25:32 ******************************************************
7/21 14:25:32 ** condor_starter (CONDOR_STARTER) STARTING UP
7/21 14:25:32 ** $CondorVersion: 6.6.5 May  3 2004 $
7/21 14:25:32 ** $CondorPlatform: I386-LINUX-RH9 $
7/21 14:25:32 ** PID = 6863
7/21 14:25:32 ******************************************************
7/21 14:25:32 Using config file:
/home/condor/condor-6.6.5/etc/condor_config
7/21 14:25:32 Using local config files:
/home/condor/condor-6.6.5/local.thymus/con
dor_config.local
7/21 14:25:32 DaemonCore: Command Socket at <193.147.240.191:50621>
7/21 14:25:32 Done setting resource limits
7/21 14:25:32 Starter communicating with condor_shadow
<193.147.240.196:4582>
7/21 14:25:32 Submitting machine is "adonis.imim.es"
7/21 14:25:32 File transfer completed successfully.
7/21 14:25:32 Starting a VANILLA universe job with ID: 3.0
7/21 14:25:32 IWD:
/home/condor/condor-6.6.5/local.thymus/execute/dir_6863
7/21 14:25:32 Output file:
/home/condor/condor-6.6.5/local.thymus/execute/dir_6863
/2program.out
7/21 14:25:32 Error file:
/home/condor/condor-6.6.5/local.thymus/execute/dir_6863/
2program.err
7/21 14:25:32 About to exec
/home/condor/condor-6.6.5/local.thymus/execute/dir_686
3/condor_exec.exe
7/21 14:25:32 Create_Process succeeded, pid=6865
7/21 14:25:35 Process exited, pid=6865, status=0
7/21 14:25:35 ReliSock: put_file: Failed to open file
/home/condor/condor-6.6.5/lo
cal.thymus/execute/dir_6863/2program.log, errno = 2.
7/21 14:25:35 ERROR "DoUpload: Failed to send file
/home/condor/condor-6.6.5/local
.thymus/execute/dir_6863/2program.log, exiting at 1379
" at line 1378 in file file_transfer.C
7/21 14:25:35 ShutdownFast all jobs.
And when we see the log file from the job we get:
2program.log:
000 (003.000.000) 07/21 14:24:38 Job submitted from host:
<193.147.240.196:4543>
...
001 (003.000.000) 07/21 14:28:59 Job executing on host:
<193.147.240.191:50474>
...
007 (003.000.000) 07/21 14:29:03 Shadow exception!
        Can no longer talk to condor_starter on execute machine
(193.147.240.191)
        0  -  Run Bytes Sent By Job
        14829  -  Run Bytes Received By Job
...
001 (003.000.000) 07/21 14:29:04 Job executing on host:
<193.147.240.191:50474>
...
007 (003.000.000) 07/21 14:29:07 Shadow exception!
        Can no longer talk to condor_starter on execute machine
(193.147.240.191)
        0  -  Run Bytes Sent By Job
        14829  -  Run Bytes Received By Job
Any idea where the problem is?