Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Condor 7.6 - Windows Parallel Universe problems
- Date: Thu, 03 May 2012 21:03:57 +0200
- From: Felix Wolfheimer <f.wolfheimer@xxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Condor 7.6 - Windows Parallel Universe problems
I investigated recently a problem I reported a while ago on this list
but got no reply (so I was probably the only one experiencing it ;-).
However, I want report the solution here just in case someone else
stumbles across it.
The problematic setup:
I use a pool of Windows machines dedicated to Condor. The pool runs a
Central Manager, a Schedd, and a credd and the users all submit from
external Windows client machines to the pool's schedd using the
"-remote" option for condor_submit. I've enabled PASSWORD authentication
on the pool which might be part of the problem.
As long as the "vanilla" universe is used everything works nicely. But
if one submits a job to the "parallel" universe the job is started but
after it is finished the shadow gives the error message
01/20/12 16:39:36 (80.0) (4436): SetEffectiveOwner(FelixWolfheimer)
failed with errno=13: Permission denied.
01/20/12 16:39:36 (80.0) (4436): Failed to perform final update to job queue!
and the job is rescheduled and runs into the same problem in the end, is rescheduled again, etc.
Solution: I found out that I had to add the dummy(?) account "condor_pool" to the
QUEUE_SUPER_USERS
in the condor config file on the machine running the schedd of the pool.
Actually, this seems not very obvious to me and I wonder whether this is the intended behaviour?!
Anyway, now the parallel jobs run fine and just do what they are supposed to do. :-)