[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] standard output: Permission denied (errno 13)



Hi Chance -

Long time since we last saw each other, hope things are going well for you.

Re the problem below:

The reason the job can write to the log but cannot open stdout is because the submit side (shadow) writes the log events, while stdout is opened on the execute side (starter).  So likely the starter is not running as the user you think it is.  Two possible causes that leap to mind:

1. The condor_master on the execute node was not launched as root 

or

2.  In the config file(s), the UID_DOMAIN setting is incorrect.  If the UID_DOMAIN of the submit machine is different than the host machine, Condor will launch the job as user "nobody" which certainly cannot open your stdout file.  Note that by default, Condor wants to verify that you UID_DOMAIN setting is indeed a right-anchored subset of the actual real DNS domain.  For example, if I have a submit node
    head.dept.foo.edu
a legit UID_DOMAIN would be dept.foo.edu.  If UID_DOMAIN is cluster.edu, Condor will not believe it (for a little extra security) and you job will still run as nobody.  You can disable this check via TRUST_UID_DOMAIN=true iirc. See section 3.3.7 of the manual for additional info/settings related to uid_domain stuff.

Hope the above helps; please let us know how it turns out.

Regards
Todd
---
Todd Tannenbaum
University of Wisconsin-Madison
<-- Sent from a Palm Treo 680 phone -->

-----Original Message-----

From:  Chance Reschke <Reschke@xxxxxxxxxxxx>
Subj:  [Condor-users] standard output: Permission denied (errno 13)
Date:  Tue Apr 24, 2007 8:37 pm
Size:  2K
To:  condor-users@xxxxxxxxxxx

Hi,

I just set up a new Condor cluster and have been having trouble  
running even simple jobs.  Jobs are matched and begin to run, but  
immediately switch to state HOLD while complaining about failure to  
open the output file.  Strangely, the log file, which is located in  
the same directory as the output file is updated just fine (contents  
included below).  I have no trouble writing to files in in the output  
directory as the user in question.  This is a diskless cluster and  
all filesystems are NFS mounted, but NFS lock support is enabled.

Any help fixing this would be great - details below.

Thanks,

Chance

The Config file:

executable   = test.sh
universe     = vanilla

Log          = test.log.$(Process)
Output       = test.out.$(Process)
Error        = test.err.$(Process)
Arguments    = firstrun
queue


The executable:

#!/bin/sh

MyOutput=$1
typeset -i N
N=0
while [ $N -lt 6 ]; do
     # send something to stdout and stderr
     time echo $MyOutput
     sleep 10
     N=$N+1
done



The log file:

..
000 (016.000.000) 04/24 18:24:11 Job submitted from host:  
<192.168.99.254:44966>
..
007 (016.000.000) 04/24 18:24:16 Shadow exception!
         Error from starter on vm1@xxxxxxxxxxxx: Failed to open '/ 
work1/possu/test.out.0' as standard output: Permission denied (errno 13)
         0  -  Run Bytes Sent By Job
         0  -  Run Bytes Received By Job
..
012 (016.000.000) 04/24 18:24:16 Job was held.
         Error from starter on vm1@xxxxxxxxxxxx: Failed to open '/ 
work1/possu/test.out.0' as standard output: Permission denied (errno 13)
         Code 7 Subcode 13
..



The Master's ShadowLog:

4/24 18:24:16 ******************************************************
4/24 18:24:16 ** condor_shadow (CONDOR_SHADOW) STARTING UP
4/24 18:24:16 ** /work1/condor/sbin/condor_shadow
4/24 18:24:16 ** $CondorVersion: 6.8.4 Feb  1 2007 $
4/24 18:24:16 ** $CondorPlatform: I386-LINUX_RHEL3 $
4/24 18:24:16 ** PID = 10520
4/24 18:24:16 ** Log last touched 4/24 18:11:55
4/24 18:24:16 ******************************************************
4/24 18:24:16 Using config source: /work1/condor/condor_config
4/24 18:24:16 Using local config sources:
4/24 18:24:16    /work1/condor/hosts/syd/condor_config.local
4/24 18:24:16 DaemonCore: Command Socket at <192.168.99.254:45033>
4/24 18:24:16 Initializing a VANILLA shadow for job 16.0
4/24 18:24:16 (16.0) (10520): Request to run on <192.168.99.1:32780>  
was ACCEPTED
4/24 18:24:16 (16.0) (10520): Job 16.0 going into Hold state (code  
7,13): Error from starter on vm1@xxxxxxxxxxxx: Failed to open '/work1/ 
possu/test.out.0' as standard output: Permission denied (errno 13)
4/24 18:24:16 (16.0) (10520): **** condor_shadow (condor_SHADOW)  
EXITING WITH STATUS 112

--
Chance Reschke
Department of Biochemistry
University of Washington



_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: 
--- message truncated ---