Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Not fully able to start jobs - permissions?
- Date: Mon, 16 May 2005 11:03:14 +0200
- From: Jaime Frey <jfrey@xxxxxxxxxxx>
- Subject: Re: [Condor-users] Not fully able to start jobs - permissions?
On May 12, 2005, at 3:29 PM, Rob Pieké wrote:
I'm having some weird problems where jobs aren't starting fully.
The StarterLog file makes it look like it's trying to start but
then chokes. Specifically, it seems to be looking for log files
that it can write to. The directory it's looking for doesn't exist,
but the directory one level up does have writing privileges (ie,
that dir COULD be created if Condor wanted to do it). If I manually
create the dir, Condor roars ahead and creates the logs and runs
the job.
Now, what's interesting to me (and maybe should be a clue to me as
to how to solve this problem) is that the same directory IS being
created on the master server automatically. Is it possible that
Condor assumes that this directory is network accessible and not
per-machine? (I'm kinda grasping at straws here).
Cheers!
5/11 11:35:41 ******************************************************
5/11 11:35:41 ** condor_starter (CONDOR_STARTER) STARTING UP
5/11 11:35:41 ** /mnt/pike/gorn/Applications/condor-6.6.9-
linux_x86_64/sbin/condor_starter
5/11 11:35:41 ** $CondorVersion: 6.6.9 Mar 10 2005 $
5/11 11:35:41 ** $CondorPlatform: I386-LINUX_RH9 $
5/11 11:35:41 ** PID = 25629
5/11 11:35:41 ******************************************************
5/11 11:35:41 Using config file: /mnt/condor/accounts/condor/
condor_config
5/11 11:35:41 Using local config files: /mnt/condor/accounts/condor/
hosts/loaner1/condor_config.local
5/11 11:35:41 DaemonCore: Command Socket at <216.94.116.106:33946>
5/11 11:35:41 Done setting resource limits
5/11 11:35:41 Starter communicating with condor_shadow
<216.94.116.89:49266>
5/11 11:35:41 Submitting machine is "tamari.coredp.com"
5/11 11:35:41 Starting a VANILLA universe job with ID: 33.0
5/11 11:35:41 IWD: /var/adm/condor/spool/cluster33.proc0.subproc0
5/11 11:35:41 Failed to open standard output file '/var/adm/condor/
spool/cluster33.proc0.subproc0/condor.42811141-0.0.out': No such
file or directory (errno 2)
5/11 11:35:41 Output file: /var/adm/condor/spool/
cluster33.proc0.subproc0/condor.42811141-0.0.out
5/11 11:35:41 Failed to open standard error file '/var/adm/condor/
spool/cluster33.proc0.subproc0/condor.42811141-0.0.error': No such
file or directory (errno 2)
5/11 11:35:41 Error file: /var/adm/condor/spool/
cluster33.proc0.subproc0/condor.42811141-0.0.error
5/11 11:35:41 Failed to open some/all of the std files...
5/11 11:35:41 Aborting OsProc::StartJob.
5/11 11:35:41 Failed to start job, exiting
5/11 11:35:41 ShutdownFast all jobs.
5/11 11:35:41 **** condor_starter (condor_STARTER) EXITING WITH
STATUS 0
The starter on your execute machine is trying to open files in the
spool directory of your submit machine. I'm guessing your pool is
configured to have a shared filesystem and you submitted the job with
the -r or -s argument to condor_submit.
By default on unix, if you tell Condor that you have a shared
filesystem (by setting FILESYSTEM_DOMAIN), Condor assumes all of your
jobs' files are on that shared filesystem and the execute machine
tries to open them directly. If you run condor_submit with -r or -s,
all of the job's files are placed under the SPOOL directory on the
submit machine (that is, the machine running the schedd you're
submitting to). If that SPOOL directory isn't on the shared
filesystem, the execute machine will fail to open the job's files.
The easiest way to fix this is to set should_transfer_files to YES in
your submit file. This tells Condor to always transfer a job's files
between the submit and execute machines, rather than assume they're
accessible via a share filesystem.
+----------------------------------+---------------------------------+
| Jaime Frey | Public Split on Whether |
| jfrey@xxxxxxxxxxx | Bush Is a Divider |
| http://www.cs.wisc.edu/~jfrey/ | -- CNN Scrolling Banner |
+----------------------------------+---------------------------------+