Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Personal Condor halts job
- Date: Sun, 10 Nov 2013 16:10:14 -0500
- From: Jan Balewski <janstar1122@xxxxxxxxx>
- Subject: [HTCondor-users] Personal Condor halts job
Hi,
I just started learning how to install condor.
By base system is SL6.4
# uname -a
Linux reuse-stack05 2.6.32-358.23.2.el6.x86_64 #1 SMP Wed Oct 16 11:13:47 CDT 2013 x86_64 x86_64 x86_64 GNU/Linux
I have used yum to install condor
$ condor_version
$CondorVersion: 8.0.4 Oct 19 2013 BuildID: 189770 $
$CondorPlatform: x86_64_RedHat6 $
My local /etc/condor/condor_config.local was modified to point to itself :
CONDOR_HOST = 19x.12x.16x.5x
...
ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), 19x.12x.16x.5x
ALLOW_READ = *
In /etc/condor/condor_config
I have enabled :
USE_CKPT_SERVER = True
I have disabled my fire wall
# service iptables stop
iptables: Flushing firewall rules: [ OK ]
iptables: Setting chains to policy ACCEPT: filter [ OK ]
iptables: Unloading modules: [ OK ]
Next, I change to a non-root user balewski and created this test job with this content:
$ cat first.job
cmd = /bin/cat
args = /proc/self/status
output = first.job.$(cluster).$(process).out
error = first.job.$(cluster).$(process).err
log = first.job.log
queue 2
And submitted it :
$ condor_submit first.job
Submitting job(s)..
2 job(s) submitted to cluster 8.
After few seconds I see both jobs have been halted:
[balewski@reuse-stack05 condor3]$ condor_q
----------
-- Submitter: reuse-stack05 : <198.125.163.55:55169> : reuse-stack05
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
8.0 balewski 11/10 16:08 0+00:00:00 H 0 0.0 cat /proc/self/sta
8.1 balewski 11/10 16:08 0+00:00:00 H 0 0.0 cat /proc/self/sta
2 jobs; 0 completed, 0 removed, 0 idle, 0 running, 2 held, 0 suspended
For the following reason:
------
$ tail -f first.job.log
0 - Run Bytes Received By Job
...
012 (008.000.000) 11/10 16:08:01 Job was held.
Error from slot1@reuse-stack05: Failed to open '/home/balewski/condor3/first.job.8.0.out' as standard output: Permission denied (errno 13)
Code 7 Subcode 13
...
012 (008.001.000) 11/10 16:08:01 Job was held.
Error from slot2@reuse-stack05: Failed to open '/home/balewski/condor3/first.job.8.1.out' as standard output: Permission denied (errno 13)
Code 7 Subcode 13
...
Can you please help me to identify & fix the cause of my problem?
Thanks
Jan