Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Setting Condor Job Owner in Windows
- Date: Wed, 17 Oct 2007 16:18:00 -1000
- From: "diane" <diane@xxxxxxxxxxxxxxxxxx>
- Subject: Re: [Condor-users] Setting Condor Job Owner in Windows
Todd,
I was using your suggestion to use:
condor_submit -n winxp-dev-01 condor_submit_file
to start jobs from my java app where they run as SYSTEM when examined via
condor_q.
However, the job itself starts another set of jobs that run as
'condor-reuse-vm2',
when examined with the Windows Task Manager. I'm not sure why that is,
but the bottom line is that I need those jobs to run as user 'diane'.
Therefore, I think need to get the +Owner = "diane" feature in my
condor_submit
file working. I'm hoping that will make the children of the job in the
queue get
run as 'diane' and not 'condor_reuse-vm2'. Does anyone have info on that?
When I include the +Owner = statement there, it does start the job in the
queue as 'diane'
but then just hangs and accumulates time in the queue.
The only indication of a problem are the condor logs included below.
Note that I have the QUEUE_ALL_USERS_TRUSTED = True
In my condor.config file.
Can you or anyone give me any more advice on how to get this to work?
Here are snippets of the condor logs:
SHADOWLOG:
10/17 15:48:39 ** condor_shadow (CONDOR_SHADOW) STARTING UP
10/17 15:48:39 ** C:\condor\bin\condor_shadow.exe
10/17 15:48:39 ** $CondorVersion: 6.8.5 May 17 2007 $
10/17 15:48:39 ** $CondorPlatform: INTEL-WINNT50 $
10/17 15:48:39 ** PID = 4852
10/17 15:48:39 ** Log last touched 10/17 15:41:06
10/17 15:48:39 ******************************************************
10/17 15:48:39 Using config source: C:\condor\condor_config
10/17 15:48:39 Using local config sources:
10/17 15:48:39 C:\condor/condor_config.local
10/17 15:48:39 DaemonCore: Command Socket at <192.168.2.105:4273>
10/17 15:48:44 Initializing a VANILLA shadow for job 556.0
10/17 15:48:54 (556.0) (4852): attempt to connect to <192.168.2.105:9620>
failed: timed out after 10 seconds.
10/17 15:48:54 (556.0) (4852): ERROR: Could not locate valid credential for
user 'diane@NT AUTHORITY'
10/17 15:48:54 (556.0) (4852): init_user_ids() failed!
SCHEDLOG:
10/17 15:48:34 (pid:2208) DaemonCore: Command received via UDP from host
<192.168.2.105:4257>
10/17 15:48:34 (pid:2208) DaemonCore: received command 421 (RESCHEDULE),
calling handler (reschedule_negotiator)
10/17 15:48:34 (pid:2208) Sent ad to central manager for diane@winxp-dev-01
10/17 15:48:34 (pid:2208) Sent ad to 1 collectors for diane@winxp-dev-01
10/17 15:48:34 (pid:2208) Called reschedule_negotiator()
10/17 15:48:35 (pid:2208) Activity on stashed negotiator socket
10/17 15:48:35 (pid:2208) Negotiating for owner: diane@winxp-dev-01
10/17 15:48:35 (pid:2208) Checking consistency running and runnable jobs
10/17 15:48:35 (pid:2208) Tables are consistent
10/17 15:48:35 (pid:2208) Out of jobs - 1 jobs matched, 0 jobs idle, flock
level = 0
10/17 15:48:35 (pid:2208) Activity on stashed negotiator socket
10/17 15:48:35 (pid:2208) Negotiating for owner: diane@winxp-dev-01
10/17 15:48:35 (pid:2208) Checking consistency running and runnable jobs
10/17 15:48:35 (pid:2208) Tables are consistent
10/17 15:48:35 (pid:2208) Out of servers - 0 jobs matched, 1 jobs idle, 0
jobs rejected
10/17 15:48:39 (pid:2208) Sent ad to central manager for diane@winxp-dev-01
10/17 15:48:39 (pid:2208) Sent ad to 1 collectors for diane@winxp-dev-01
10/17 15:48:39 (pid:2208) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/17 15:48:39 (pid:2208) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/17 15:48:39 (pid:2208) Starting add_shadow_birthdate(556.0)
10/17 15:48:44 (pid:2208) Started shadow for job 556.0 on
"<192.168.2.105:1032>", (shadow pid = 4852)
10/17 15:48:44 (pid:2208) Sent ad to central manager for diane@winxp-dev-01
10/17 15:48:44 (pid:2208) Sent ad to 1 collectors for diane@winxp-dev-01
10/17 15:49:04 (pid:2208) DaemonCore: Command received via UDP from host
<192.168.2.105:4288>
10/17 15:49:04 (pid:2208) DaemonCore: received command 60011 (DC_NOP),
calling handler (handle_nop())
10/17 15:49:04 (pid:2208) Shadow pid 4852 for job 556.0 exited with status 4
10/17 15:49:04 (pid:2208) ERROR: Shadow exited with job exception code!
10/17 15:49:04 (pid:2208) ERROR: Shadow exited with job exception code!
10/17 15:49:06 (pid:2208) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/17 15:49:06 (pid:2208) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/17 15:49:06 (pid:2208) Starting add_shadow_birthdate(556.0)
10/17 15:49:10 (pid:2208) Started shadow for job 556.0 on
"<192.168.2.105:1032>", (shadow pid = 3492)
10/17 15:49:10 (pid:2208) Sent ad to central manager for diane@winxp-dev-01
10/17 15:49:10 (pid:2208) Sent ad to 1 collectors for diane@winxp-dev-01
COLLECTORLOG:
10/17 15:48:34 (Sending 1 ads in response to query)
10/17 15:48:34 (Sending 7 ads in response to query)
10/17 15:48:34 Got QUERY_STARTD_PVT_ADS
10/17 15:48:34 (Sending 2 ads in response to query)
10/17 15:48:49 NegotiatorAd : Inserting ** "< winxp-dev-01 >"
STARTLOG:
10/17 15:48:35 DaemonCore: Command received via UDP from host
<192.168.2.105:4267>
10/17 15:48:35 DaemonCore: received command 440 (MATCH_INFO), calling
handler (command_match_info)
10/17 15:48:35 vm2: match_info called
10/17 15:48:35 vm2: Received match <192.168.2.105:1032>#1192108044#159#...
10/17 15:48:35 vm2: State change: match notification protocol successful
10/17 15:48:35 vm2: Changing state: Unclaimed -> Matched
10/17 15:48:35 DaemonCore: Command received via TCP from host
<192.168.2.105:4268>
10/17 15:48:35 DaemonCore: received command 442 (REQUEST_CLAIM), calling
handler (command_request_claim)
10/17 15:48:35 vm2: Request accepted.
10/17 15:48:35 vm2: Remote owner is diane@winxp-dev-01
10/17 15:48:35 vm2: State change: claiming protocol successful
10/17 15:48:35 vm2: Changing state: Matched -> Claimed
10/17 15:50:51 DaemonCore: Command received via UDP from host
<192.168.2.105:4325>
10/17 15:50:51 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_release_claim)
10/17 15:50:51 vm2: State change: received RELEASE_CLAIM command
10/17 15:50:51 vm2: Changing state and activity: Claimed/Idle ->
Preempting/Vacating
10/17 15:50:51 vm2: State change: No preempting claim, returning to owner
10/17 15:50:51 vm2: Changing state and activity: Preempting/Vacating ->
Owner/Idle
10/17 15:50:51 vm2: State change: IS_OWNER is false
10/17 15:50:51 vm2: Changing state: Owner -> Unclaimed
10/17 15:50:51 DaemonCore: Command received via UDP from host
<192.168.2.105:4326>
10/17 15:50:51 DaemonCore: received command 443 (RELEASE_CLAIM), calling
handler (command_release_claim)
10/17 15:50:51 Warning: can't find resource with ClaimId
(<192.168.2.105:1032>#1192108044#159#...)
NEGOTIATORLOG:
10/17 15:48:34 ---------- Started Negotiation Cycle ----------
10/17 15:48:34 Phase 1: Obtaining ads from collector ...
10/17 15:48:34 Getting all public ads ...
10/17 15:48:34 Sorting 7 ads ...
10/17 15:48:34 Getting startd private ads ...
10/17 15:48:34 Got ads: 7 public and 2 private
10/17 15:48:34 Public ads include 2 submitter, 2 startd
10/17 15:48:35 Phase 2: Performing accounting ...
10/17 15:48:35 Phase 3: Sorting submitter ads by priority ...
10/17 15:48:35 Phase 4.1: Negotiating with schedds ...
10/17 15:48:35 Negotiating with diane@winxp-dev-01 at <192.168.2.105:1031>
10/17 15:48:35 0 seconds so far
10/17 15:48:35 Request 00556.00000:
10/17 15:48:35 Matched 556.0 diane@winxp-dev-01 <192.168.2.105:1031>
preempting none <192.168.2.105:1032> vm2@winxp-dev-01
10/17 15:48:35 Successfully matched with vm2@winxp-dev-01
10/17 15:48:35 Got NO_MORE_JOBS; done negotiating
10/17 15:48:35 Phase 4.2: Negotiating with schedds ...
10/17 15:48:35 Negotiating with diane@winxp-dev-01 at <192.168.2.105:1031>
10/17 15:48:35 0 seconds so far
10/17 15:48:35 Got NO_MORE_JOBS; done negotiating
10/17 15:48:35 ---------- Finished Negotiation Cycle ----------
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Diane
Sent: Thursday, October 04, 2007 12:30 PM
To: 'Condor-Users Mail List'
Subject: Re: [Condor-users] Setting Condor Job Owner in Windows
Thanks Todd,
Your last suggestion finally worked!
I used -n winxp-dev-01 option on the condor submit command and the job ran
(as SYSTEM).
Note that I did not use the setting +Owner = "diane"
because when I did use 'diane' it seemed to get hung checking things (with
run time increasing) and SchedLog contained:
10/4 12:10:22 (pid:2152) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
10/4 12:10:22 (pid:2152) perm::init: Lookup Account Name diane failed
(err=1332), using Everyone
I'll look at that some more but for now at least condor is working running
as SYSTEM.
Also, I tried to set the creds for SYSTEM with a dummy pw as you suggest
below (from a SYSTEM cmd window), but the condor_store_cred command failed
because it couldn't find the host,
even though HOSTALLOW_WRITE=*. I believe it's looking for SYSTEM@NT
AUTHORITY not WINXP-DEV-01. Here is the exchange:
C:\WINDOWS\system32>\condor\bin\condor_store_cred add
Account: SYSTEM@NT AUTHORITY
Enter password:
Operation failed.
Make sure your HOSTALLOW_WRITE setting includes this host.
C:\WINDOWS\system32>
I'll look at the above some more too, but setting that may not be necessary.
Thanks so much again,
Diane
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
Sent: Thursday, October 04, 2007 10:57 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Setting Condor Job Owner in Windows
Diane wrote:
> Hi Todd,
>
> Your idea sounded great! However, I tried it without success (the job
still
> starts as SYSTEM even thought the condor.submit file say +Owner = "diane",
> and I had reconfigured condor to disable Queue access checks). The job
> never gets into the queue and returns with condor.error:
>
> ERROR: No credential stored for SYSTEM@NT AUTHORITY
>
> Correct this by running:
> condor_store_cred add
>
> In hopes of figuring this out, I have included here the relevant parts of
> the condor logs (in particular SchedLog showing queue access checks
> disabled), and my condor.submit file.
>
> If you have any insights that would be great.
Ok, the formula in my previous post tells you how to setup Condor so
User A (in your case, SYSTEM) can submit a job as User B (diane).
What I failed to say is how to disable the (normally helpful) check that
condor_submit makes to be certain a password is stored for the user
running condor_submit. After all, you don't care that SYSTEM does not
have a password stored since the job will run as diane ... but
condor_submit isn't smart enough to know that.
However, if you use the "-n <schedd-name>" argument to condor_submit, it
will not do this "see if a password is stored" check. So to get it to
work, try
condor_submit -n winxp-dev-01 Condor.submit
Another idea that may be even easier: As user "SYSTEM", run
condor_store_cred add
and just give it a bogus password. Condor won't ever use it, but
condor_submit will be happy when it looks to see that one is stored.
If you don't know how to open up a command window as user SYSTEM, see
http://blogs.msdn.com/adioltean/articles/271063.aspx
which gives one way to do it (personally, i made a service that does it).
Good Luck! Let me know how it goes...
If it helps, below is a screenshot of a successful test I did:
C:\temp\test>whoami
SYSTEM
C:\temp\test>hostname
tannenbaum-t23
C:\temp\test>condor_submit -n tannenbaum-t23 test.sub
Submitting job(s).
1 job(s) submitted to cluster 37.
C:\temp\test>condor_q
-- Submitter: tannenbaum-t23 : <127.0.0.1:1357> : tannenbaum-t23
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
37.0 diane 10/4 15:41 0+00:00:00 H 0 9.8 test.sub
1 jobs; 0 idle, 0 running, 1 held
C:\temp\test>type test.sub
executable = test.sub
hold = true
+Owner = "diane"
universe = vanilla
queue
--
Todd Tannenbaum University of Wisconsin-Madison
Condor Project Research Department of Computer Sciences
tannenba@xxxxxxxxxxx 1210 W. Dayton St. Rm #4257
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/