[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Dagman and databases



Dear All,

There are two nodes in my set up. 

I have a simple Dag of the form:
Job condor_script1 /path/condor_script1.sub
Script POST condor_script1 /path/data2db.py $RETURN $JOB

The dag will run perfectly on node1 if the executable in
condor_script1.sub does not try to connect to a postgres DB on node0.

When the executable connects to the DB, then 'condor_submit
condor_script1.sub' will run the job correctly. 

If I use the dag, the job can't run on node1. It will run on node0.

Since the job runs on using a simple submit but not a dag I think the
problem is condor related.

Note: I'm using the Debian package

Thanks for you help

Colin



The error messages I'm getting are:

StarterLog on Node0
 
******************************************************
10/28 15:20:04 ** condor_starter (CONDOR_STARTER) STARTING UP
10/28 15:20:04 ** /usr/sbin/condor_starter
10/28 15:20:04 ** $CondorVersion: 6.7.1 Aug 10 2004 $
10/28 15:20:04 ** $CondorPlatform: I386-LINUX_RH9 $
10/28 15:20:04 ** PID = 9988
10/28 15:20:04 ******************************************************
10/28 15:20:04 Using config file: /home/condor/condor_config
10/28 15:20:04 Using local config files:
/home/condor/hosts/b01/condor_config.local
10/28 15:20:04 DaemonCore: Command Socket at <X:36338>
10/28 15:20:04 Done setting resource limits
10/28 15:20:04 Communicating with shadow <X:47005>
10/28 15:20:04 Submitting machine is "basis.basis.prv"
10/28 15:20:04 Starting a VANILLA universe job with ID: 53.0
10/28 15:20:04 IWD: /var/lib/postgres/data/base/17149
10/28 15:20:04 Output file: /path/condor_script1.output
10/28 15:20:04 Error file: /path/condor_script1.error
10/28 15:20:04 About to exec /path/simulate -r 1 -n 5 -t 5.0
10/28 15:20:04 Create_Process: Cannot access specified cwd
"/var/lib/postgres/data/base/17149": errno = 2 (No such file or
directory)
10/28 15:20:04 ERROR "Create_Process(/path/simulate,condor_exec.exe -r 1
-n 5 -t 5.0, ...) failed" at line 403 in file os_proc.C
10/28 15:20:04 ShutdownFast all jobs.