Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] jobs stuck in queue
- Date: Mon, 22 Aug 2011 18:16:28 +0000
- From: "Koller, Garrett" <kollerg14@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] jobs stuck in queue
Mr. Cannini,
Oh, I think I'm beginning to see the problem. Look at the StartLog and note the authentication errors:
> 08/19/11 17:21:30 PERMISSION DENIED to unauthenticated@unmapped from host
> 172.17.8.121 for command 442 (REQUEST_CLAIM), access level DAEMON: reason:
> DAEMON authorization policy contains no matching ALLOW entry for this request; identifiers used
The "unauthenticated@unmapped" part means that you simply do not have authentication configured correctly. First of all, what forms of authentication are you trying to use? Run 'condor_config_val -v SEC_CLIENT_AUTHENTICATION_METHODS' and 'condor_config_val -v SEC_DEFAULT_AUTHENTICATION_METHODS' to find out.
The typical forms are FS, FS_REMOTE, and PASSWORD. To learn more about how they work, look at http://servo.cs.wlu.edu/dokuwiki/doku.php/condor/administration/authentication . Look at http://condor.cs.wlu.edu/condor/config/condor_config_global for an example Condor configuration that uses authentication (Ctrl-F and search for "Authentication").
Once you have authentication correctly configured, the authentication will allow daemons to identify themselves to Condor as "<username>@<hostname>". If Condor runs as the user 'condor' (or as 'root' pretending to be 'condor') on the computer 'condor.cs.wlu.edu', for example, then that means that you need to add "condor@xxxxxxxxxxxxxxxxx" to the ALLOW_DAEMON configuration variable to let the daemons communicate.
Does this make sense? If so, does this help?
Best Regards,
- Garrett Heath Koller
condor.cs.wlu.edu
________________________________________
From: condor-users-bounces@xxxxxxxxxxx [condor-users-bounces@xxxxxxxxxxx] on behalf of Fabricio Cannini [fcannini@xxxxxxxxx]
Sent: Monday, August 22, 2011 1:58 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] jobs stuck in queue
Em sexta-feira 19 agosto 2011, às 19:09:54, Koller, Garrett escreveu:
> Mr. Cannini,
>
> I'm not yet familiar with running MPI jobs on Condor, but I think I've come
> across a similar situation. First of all, run 'condor_q -better-analyze'
> to figure out if the job's requirements are causing it to not be scheduled
> in the first place. If it says "not yet considered by matchmaker" or
> something, it usually means that it is being run but encounters an error
> shortly thereafter and so is continuously put back on the queue. Check
> the MatchLog. If it keeps saying that the same job is "Matched", it means
> that the job successfully scheduled but something goes wrong with the
> execute machine. Check which slot and what machine the job is assigned
> to. Go to the log files of that machine and look for the StarterLog for
> that slot. The bottom of that log should tell you what error you program
> encountered that caused it to exit. Let me/us know if this doesn't help
> you diagnose and solve the problem.
>
> Best Regards,
> - Garrett
Hi.
'condor_q -better-analyze 35' says this:
-- Submitter: master.internal.domain : <172.17.8.121:42584> :
master.internal.domain
===============================
---
035.000: Run analysis summary. Of 0 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 match but are currently offline
0 are available to run your job
WARNING: Be advised:
No resources matched request's constraints
WARNING: Be advised: Request 35.0 did not match any resource's constraints
===============================
The StartLog of both nodes has messages like this:
+++++++++++++++++++++++++++++++
08/19/11 17:21:30 slot3: match_info called
08/19/11 17:21:30 slot3: Received match <172.17.8.51:56215>#1313779372#3#...
08/19/11 17:21:30 slot3: State change: match notification protocol successful
08/19/11 17:21:30 slot3: Changing state: Unclaimed -> Matched
08/19/11 17:21:30 PERMISSION DENIED to unauthenticated@unmapped from host
172.17.8.121 for command 442 (REQUEST_CLAIM), access level DAEMON: reason:
DAEMON authorizatio
n policy contains no matching ALLOW entry for this request; identifiers used
for this host: 172.17.8.121,master,master.internal.domain,internal.domain
08/19/11 17:21:51 slot4: match_info called
08/19/11 17:21:51 slot4: Received match <172.17.8.51:56215>#1313779372#4#...
08/19/11 17:21:51 slot4: State change: match notification protocol successful
08/19/11 17:21:51 slot4: Changing state: Unclaimed -> Matched
08/19/11 17:21:51 PERMISSION DENIED to unauthenticated@unmapped from host
172.17.8.121 for command 442 (REQUEST_CLAIM), access level DAEMON: reason:
cached result for DAEMON; see first case for the full reason
+++++++++++++++++++++++++++++++
TIA
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/