Ok, so this is not an ALLOW_WRITE issue in the CREDD or we would not have gotten this far.
10/10/18 08:59:25 Calling HandleReq <store_cred_handler> (0) for command 479 (STORE_CRED) from
calibration@lgs-net <194.11.95.204:59824>
But the next message is this
10/10/18 09:00:30 store_cred: Failed to send/recv user.
10/10/18 09:00:30 store_cred: code_store_cred failed.
Which indicates that the the CREDD was unable to read the username off of the wire. That would be something outside of HTCondor - some sort of firewall or antivirus or something
interfering with the communication on the wire.
Otherwise, the only explanation I can think of would be a version mismatch between the CREDD and the execute node. Are both the CREDD and the execute node a version of HTCondor
before 8.5.8 or after. In 8.5.8 we changed the STORE_CRED command a bit, and that might be causing an issue here.
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of rb
Sent: Wednesday, October 10, 2018 4:36 AM
To: htcondor-users@xxxxxxxxxxx
Cc: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
this is what i see in the CREDD Log
(first I entered the PW on the submitter aherdskbld03 with IP 194.11.95.204 then on the execute node aherdskbld04 with IP 194.11.95.205)
10/10/18 08:59:25 Calling HandleReq <store_cred_handler> (0) for command 479 (STORE_CRED) from
calibration@lgs-net <194.11.95.204:59824>
10/10/18 08:59:25 Return from HandleReq <store_cred_handler> (handler: 0.056644s, sec: 0.000s, payload: 0.000s)
10/10/18 09:00:30 Calling Handler <DaemonCommandProtocol::WaitForSocketData> (2)
10/10/18 09:00:30 Calling HandleReq <store_cred_handler> (0) for command 479 (STORE_CRED) from
calibration@lgs-net <194.11.95.205:62489>
10/10/18 09:00:30 store_cred: Failed to send/recv user.
10/10/18 09:00:30 store_cred: code_store_cred failed.
10/10/18 09:00:30 Return from HandleReq <store_cred_handler> (handler: 0.000408s, sec: 0.031s, payload: 0.000s)
10/10/18 09:00:30 Return from Handler <DaemonCommandProtocol::WaitForSocketData> 0.026974s
-----------------------
-----------------------
Gesendet: Dienstag, 09. Oktober 2018 um 18:03 Uhr
Von: "John M Knoeller" <johnkn@xxxxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Betreff: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
In order for condor_store_cred to store a password, it must send a command to a daemon. For the pool password, it
uses the condor_master daemon. But for a user password, it must be able to contact either a condor_schedd or condor_credd daemon from that machine.
So on an execute node that does not have a SCHEDD running, it would be normal be able to use condor_store_cred to
store the pool password, but not a user password unless the execute node is configured to use a CREDD.
So the problem must be that the CREDD is not responding to this host. And this message
10/08/18 17:21:03 store_cred: failed to recv answer.
Operation failed.
Make sure your ALLOW_WRITE setting includes this host.
seems to back that up.
What does the CreddLog show at the time when you tried to run condor_store_cred on the execute node?
-tj
Yes, CREDD is running on the Pool machine (ahersrvbld28).
Not on this node, though.
As I wrote before I was able to specify a Pool Password on Node, Pool and submitter.
This is the Condor_config from the node (aherdskbld04)
CONDOR_HOST = 194.11.95.125
UID_DOMAIN = lgs-net.com
CONDOR_ADMIN = calibration@LGS-NET
SMTP_SERVER =
ALLOW_READ = *
ALLOW_WRITE = *
ALLOW_ADMINISTRATOR = *
JAVA = C:\PROGRA~1\Java\JRE18~2.0_1\bin\java.exe
use POLICY : ALWAYS_RUN_JOBS
WANT_VACATE = FALSE
WANT_SUSPEND = TRUE
DAEMON_LIST = MASTER STARTD
NUM_SLOTS = $(detected_Memory)/16000
FILESYSTEM_DOMAIN = lgs-net.com
TRUST_UID_DOMAIN = true
SOFT_UID_DOMAIN = true
STARTER_ALLOW_RUNAS_OWNER = true
CREDD_HOST = AHERSRVBLD28.lgs-net.com
CREDD_CACHE_LOCALLY = True
ALLOW_CONFIG = *
SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
SEC_CONFIG_NEGOTIATION = REQUIRED
SEC_CONFIG_AUTHENTICATION = REQUIRED
SEC_CONFIG_ENCRYPTION = REQUIRED
SEC_CONFIG_INTEGRITY = REQUIRED
this is the condor config for the pool-master (ahersrvbld28)
CONDOR_HOST = 194.11.95.125
COLLECTOR_NAME = HxMap_IT
UID_DOMAIN = lgs-net.com
CONDOR_ADMIN = Calibration@xxxxxxxxxxx
SMTP_SERVER =
ALLOW_READ = *
ALLOW_WRITE = *
ALLOW_ADMINISTRATOR = *
START = FALSE
WANT_VACATE = FALSE
WANT_SUSPEND = TRUE
DAEMON_LIST = MASTER SCHEDD COLLECTOR NEGOTIATOR CREDD
NUM_SLOTS_Type1 = 1
FILESYSTEM_DOMAIN = lgs-net.com
TRUST_UID_DOMAIN = true
SOFT_UID_DOMAIN = true
STARTER_ALLOW_RUNAS_OWNER = true
CREDD_HOST = ahersrvbld28.lgs-net.com
CREDD_CACHE_LOCALLY = True
SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
ALLOW_CONFIG = *
SEC_CONFIG_NEGOTIATION = REQUIRED
SEC_CONFIG_AUTHENTICATION = REQUIRED
SEC_CONFIG_ENCRYPTION = REQUIRED
SEC_CONFIG_INTEGRITY = REQUIRED
CREDD_LOG = $(LOG)/CreddLog
CREDD_DEBUG = D_COMMAND
MAX_CREDD_LOG = 50000000
This is what I get with your suggestion:
C:\Users\calibration>condor_store_cred -debug add
Account: calibration@LGS-NET
Enter password:
10/08/18 17:21:03 STORE_CRED: In mode 'add'
10/08/18 17:21:03 ZKM: First potential block in store_cred, DC==0
10/08/18 17:21:03 store_cred: failed to recv answer.
Operation failed.
Make sure your ALLOW_WRITE setting includes this host.
-----------------------
-----------------------
Gesendet: Montag, 08. Oktober 2018 um
16:35 Uhr
Von: "John M Knoeller" <johnkn@xxxxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Betreff: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
yes.
10/08/18 11:47:27 (pid:3112) ERROR: Could not locate valid credential for user
'calibration@LGS-NET'
is definitely a problem. If you are using a CREDD, then we need to look at the credd configuration
for this node, and possibly the ALLOW_* permissions in the creddâs configuration.
If you are not using a credd, then you need to run this command on the execute node
condor_store_cred -debug -add -u
calibration@LGS-NET
The -debug options is so in case it fails, we get additional error messages
Alternatlive, you could login to the execute node as calibration@LGS-NET
and then just run
condor_store_cred -debug -add
-tj
I just ran another job, so timing is not corresponding to your request. BUT, it is always the same entries, so you
would have gotten the same on 05:33.
10/08/18 11:47:27 slot1: Request accepted.
10/08/18 11:47:27 WARNING: forward resolution of ahercaxhdx32.lgs-net.com doesn't match 194.11.95.204!
10/08/18 11:47:27 slot1: Remote owner is calibration@xxxxxxxxxxx
10/08/18 11:47:27 slot1: State change: claiming protocol successful
10/08/18 11:47:27 slot1: Changing state: Unclaimed -> Claimed
10/08/18 11:47:27 slot1: Got activate_claim request from shadow (194.11.95.204)
10/08/18 11:47:27 slot1: Remote job ID is 7213.0
10/08/18 11:47:27 slot1: Got universe "VANILLA" (5) from request classad
10/08/18 11:47:27 slot1: State change: claim-activation protocol successful
10/08/18 11:47:27 slot1: Changing activity: Idle -> Busy
10/08/18 11:47:27 condor_read() failed: recv(fd=1808) returned -1, errno = 10054 , reading 5 bytes from <127.0.0.1:62227>.
10/08/18 11:47:27 IO: Failed to read packet header
10/08/18 11:47:27 Starter pid 3112 exited with status 1
10/08/18 11:47:27 slot1: State change: starter exited
10/08/18 11:47:27 slot1: Changing activity: Busy -> Idle
10/08/18 11:47:27 Aborting CA_LOCATE_STARTER
10/08/18 11:47:27 ClaimId (<194.11.95.205:9618>#1538471621#8118#[Encryption="NO";Integrity="NO";CryptoMethods="3DES";]f68f93d04b3e507e18ee0978b7b4ad0c2a1b58e7) and GlobalJobId ( AHERDSKBLD03.lgs-net.com#7213.0#1538984881 ) not found
10/08/18 11:47:27 slot1: State change: received RELEASE_CLAIM command
10/08/18 11:47:27 slot1: Changing state and activity: Claimed/Idle -> Preempting/Vacating
10/08/18 11:47:27 slot1: State change: No preempting claim, returning to owner
10/08/18 11:47:27 slot1: Changing state and activity: Preempting/Vacating -> Owner/Idle
10/08/18 11:47:27 slot1: State change: IS_OWNER is false
10/08/18 11:47:27 slot1: Changing state: Owner -> Unclaimed
10/08/18 11:47:27 (pid:3112) ******************************************************
10/08/18 11:47:27 (pid:3112) ** condor_starter (CONDOR_STARTER) STARTING UP
10/08/18 11:47:27 (pid:3112) ** C:\condor\bin\condor_starter.exe
10/08/18 11:47:27 (pid:3112) ** SubsystemInfo: name=STARTER type=STARTER(8) class=DAEMON(1)
10/08/18 11:47:27 (pid:3112) ** Configuration: subsystem:STARTER local:<NONE> class:DAEMON
10/08/18 11:47:27 (pid:3112) ** $CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $
10/08/18 11:47:27 (pid:3112) ** $CondorPlatform: x86_64_Windows10 $
10/08/18 11:47:27 (pid:3112) ** PID = 3112
10/08/18 11:47:27 (pid:3112) ** Log last touched 10/8 11:46:29
10/08/18 11:47:27 (pid:3112) ******************************************************
10/08/18 11:47:27 (pid:3112) Using config source: C:\condor\condor_config
10/08/18 11:47:27 (pid:3112) Using local config sources:
10/08/18 11:47:27 (pid:3112) C:\condor\condor_config.local
10/08/18 11:47:27 (pid:3112) config Macros = 67, Sorted = 66, StringBytes = 1547, TablesBytes = 2460
10/08/18 11:47:27 (pid:3112) CLASSAD_CACHING is OFF
10/08/18 11:47:27 (pid:3112) Daemon Log is logging: D_ALWAYS D_ERROR
10/08/18 11:47:27 (pid:3112) SharedPortEndpoint: listener already created.
10/08/18 11:47:27 (pid:3112) DaemonCore: command socket at <194.11.95.205:9618?addrs=194.11.95.205-9618&noUDP&sock=12572_410a_4061>
10/08/18 11:47:27 (pid:3112) DaemonCore: private command socket at <194.11.95.205:9618?addrs=194.11.95.205-9618&noUDP&sock=12572_410a_4061>
10/08/18 11:47:27 (pid:3112) GLEXEC_JOB not supported on this platform; ignoring
10/08/18 11:47:27 (pid:3112) Communicating with shadow <194.11.95.204:52107?addrs=194.11.95.204-52107>
10/08/18 11:47:27 (pid:3112) Submitting machine is "194.11.95.204"
10/08/18 11:47:27 (pid:3112) setting the orig job name in starter
10/08/18 11:47:27 (pid:3112) setting the orig job iwd in starter
10/08/18 11:47:27 (pid:3112) condor_read() failed: recv(fd=852) returned -1, errno = 10054 , reading 21 bytes from credd ahersrvbld28.lgs-net.com.
10/08/18 11:47:27 (pid:3112) IO: Failed to read packet header
10/08/18 11:47:27 (pid:3112) ERROR: Could not locate valid credential for user
'calibration@LGS-NET'
10/08/18 11:47:27 (pid:3112) Could not initialize user_priv as "LGS-NET\calibration".
Make sure this account's password is securely stored with condor_store_cred.
10/08/18 11:47:27 (pid:3112) ERROR: Failed to determine what user to run this job as, aborting
10/08/18 11:47:27 (pid:3112) Failed to initialize JobInfoCommunicator, aborting
10/08/18 11:47:27 (pid:3112) Unable to start job.
10/08/18 11:47:27 (pid:3112) SharedPortEndpoint: Destructor: Problem in thread shutdown notification: 0
10/08/18 11:47:27 (pid:3112) **** condor_starter (condor_STARTER) pid 3112 EXITING WITH STATUS 1
(has only entries from 5 days ago)
--> tried to add credentials on my processing node aherdskbld04, but failed.
--> it was possible to add credentials on submitter and pool, but not on the node
--> it was possible to add a pool PW on all machines.
-----------------------
-----------------------
Gesendet: Freitag, 05. Oktober 2018 um
18:19 Uhr
Von: "John M Knoeller" <johnkn@xxxxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Betreff: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
what does the StartLog and the StarterLog and StarterLog.slot1 on AHERDSKBLD04.lgs-net.com say at time
05:33 ? (actually best to look at the time of the *first* disconnection)
The messages you see in the jobâs log file indicate that the job did match and at least
attempt to start, but that something went wrong. This could be a HTCondor configuration issue, or a problem with your firewall, or some problem with starting the job itself on that machine. The StartLog or StarterLog or StarterLog.slot1 will give a clearer
indication of what the problem is.
-tj
Hello TJ,
thanks for the response.
I tried both:
Deleting the entry "load_profile = True" and setting it to "load_profile = False"
Neither did help.
Here is an extract from my Submission:
Universe = vanilla
Notification = Error
Notify_user = user@xxxxxxxxxxx
# OS requirements
Requirements = ( (OpSys == "WINNT51" || OpSys == "WINNT52" || OpSys == "WINNT60" || OpSys == "WINNT61") || ((OpSys == "WINDOWS" || OpSys == "LINUX") && Arch == "X86_64") )
Rank = kflops + memory*1024 - (Machine =?= LastRemoteHost)*500000
# Be sure to copy files back and forth to the node (linux disables this by default)
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
RunAsOwner = true
load_profile = False
Executable = hxmap_condor_runner_$$(OpSys)_$$(Arch).bat
Output = 181005053650_20180820084007_ingest____________create_.out
Log = 181005053650_20180820084007_ingest____________create_.log
Error = 181005053650_20180820084007_ingest____________create_.err
This is again what i get when running
C:\Users\calibration>Condor_status -af:h Name OpSys Arch LocalCredd HasWindowsRunAsOwner
Remark: only aherdskbld04.lgs-net.com is configured to run jobs as owner.
I can observe that this machine is selected by schedd but not send to the machine.
Here is an extract from the log (the same entry repeats endless in the log)
022 (7209.000.000) 10/05 05:33:14 Job disconnected, attempting to reconnect
Socket between submit and execute hosts closed unexpectedly
Trying to reconnect to
slot1@xxxxxxxxxxxxxxxxxxxxxxxx
<194.11.95.205:9618?addrs=194.11.95.205-9618&noUDP&sock=12560_40bc_3>
...
024 (7209.000.000) 10/05 05:33:14 Job reconnection failed
Job not found at execution machine
Can not reconnect to
slot1@xxxxxxxxxxxxxxxxxxxxxxxx,
rescheduling job
NOt sure what I set not correct. Must be a small setting somewhere I am missing....
-----------------------
-----------------------
> Gesendet: Mittwoch, 03. Oktober 2018 um 20:53 Uhr
> Von: "John M Knoeller" <johnkn@xxxxxxxxxxx>
> An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> Betreff: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
>
> either run_as_owner or RunAsOwner will work. and yes, load_profile conflicts with run_as_owner.
> you must set one or the other but you cannot set both.
>
> -tj
>
> -----Original Message-----
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of rb
> Sent: Tuesday, October 2, 2018 5:00 AM
> To: htcondor-users@xxxxxxxxxxx
> Subject: Re: [HTCondor-users] Fwd: Re: Cannot sent jobs as Owner in WindowsOS
>
>
> Hi TJ
>
> Sorry for the delay, I was on PTO the past couple of days.
> To your question pls see attachment.
> only machine AherSRVBLD28 (Pool), AherDSKBLD03 (submitter) and AHERDSKBLD04 (Node) was configured to run Jobs as Owner.
>
> a)
> Do I need to specify in the Submission file
> Run_As_owner or RunAsOwner?
>
> b)
> by default we have
> load_profile = True
> in the submission file.
> Is this a conflict to "Run_as_owner"
>
>
> Best regards,
> Robert
>
>
>
>
> -----------------------
>
> -----------------------
>
>
> > Gesendet: Donnerstag, 27. September 2018 um 23:49 Uhr
> > Von: "John M Knoeller" <johnkn@xxxxxxxxxxx>
> > An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
> > Betreff: Re: [HTCondor-users] Fwd: Aw: Re: Cannot sent jobs as Owner in WindowsOS
> >
> > This part of the condor_q -analyze output
> >
> > 1 ( ( ( OpSys == "WINNT51" || OpSys == "WINNT52" || OpSys == "WINNT60" || OpSys == "WINNT61" ) || ( ( OpSys == "WINDOWS" || OpSys == "LINUX" ) && Arch == "X86_64" ) ) )
> > 0 REMOVE
> > 2 ( TARGET.HasWindowsRunAsOwner && ( TARGET.LocalCredd is "AHERSRVBLD28.lgs-net.com" )
> >
> >
> > is saying that there are no machines in your pool that are ARCH == X86_64 and also support WindowsRunAsOwner and are using the necessary value for LocalCredd
> >
> >
> > What Does
> >
> > condor_status -af:h Name OpSys Arch LocalCredd HasWindowsRunAsOwner
> >
> >
> > show?
> >
> > -tj
> >
> >
> >
> >
> > From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of rb
> > Sent: Thursday, September 27, 2018 8:40 AM
> > To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> > Subject: [HTCondor-users] Fwd: Aw: Re: Cannot sent jobs as Owner in WindowsOS
> >
> >
> > Von: rb
> > Datum: 19. September 2018 um 11:02
> > An: "Todd Tannenbaum"
> > Betreff: Aw: Re: [HTCondor-users] Cannot sent jobs as Owner in WindowsOS
> >
> >
> >
> > Hello Todd,
> >
> > thanks for the additional hints.
> > I was able to move a bit forward, but was not yet successful.
> > Eg I was able to specify a condor-pool PW. Jobs are now picked up by condor, however non of them are picked by the nodes as it seems the requirements are not matching.
> > (Remark: Jobs are matching and running when using the default temp user from condor)
> >
> >
> > I attach the condor config files I created now. One for master, one submitter, one node.
> > The submission files contain a line: "Run_as_owner = true"
> >
> > a) Basically I copied the content of the ..\etc\condor_config.local.credd into the condor config file of the pool manager running CREDD
> > b) copied
> > CREDD_HOST = credd.cs.wisc.edu
> > CREDD_CACHE_LOCALLY = True
> >
> > STARTER_ALLOW_RUNAS_OWNER = True
> >
> > ALLOW_CONFIG = Administrator@*
> > SEC_CLIENT_AUTHENTICATION_METHODS = NTSSPI, PASSWORD
> > SEC_CONFIG_NEGOTIATION = REQUIRED
> > SEC_CONFIG_AUTHENTICATION = REQUIRED
> > SEC_CONFIG_ENCRYPTION = REQUIRED
> > SEC_CONFIG_INTEGRITY = REQUIRED
> > into all processing and submitter machines.
> >
> >
> > When now running jobs they are stucked in the queue.
> > Running condor_q -analyze is giving the following message:
> >
> > WARNING: Be advised:
> > No resources matched request's constraints
> > The Requirements _expression_ for your job is:
> > ( ( ( OpSys == "WINNT51" || OpSys == "WINNT52" || OpSys == "WINNT60" ||
> > OpSys == "WINNT61" ) || ( ( OpSys == "WINDOWS" ||
> > OpSys == "LINUX" ) && Arch == "X86_64" ) ) ) &&
> > ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
> > ( TARGET.HasFileTransfer ) && ( TARGET.HasWindowsRunAsOwner &&
> > ( TARGET.LocalCredd is "AHERSRVBLD28.lgs-net.com" ) )
> >
> > Suggestions:
> > Condition Machines Matched Suggestion
> > --------- ---------------- ----------
> > 1 ( ( ( OpSys == "WINNT51" || OpSys == "WINNT52" || OpSys == "WINNT60" || OpSys == "WINNT61" ) || ( ( OpSys == "WINDOWS" || OpSys == "LINUX" ) && Arch == "X86_64" ) ) )
> > 0 REMOVE
> > 2 ( TARGET.HasWindowsRunAsOwner && ( TARGET.LocalCredd is "AHERSRVBLD28.lgs-net.com" ) )
> > 0 REMOVE
> > 3 ( TARGET.Disk >= 3 ) 18
> > 4 ( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,0) )
> > 18
> > 5 ( TARGET.HasFileTransfer ) 18
> > ---
> > 7163.000: Request is running.
> >
> >
> >
> >
> >
> >
> > Some questions:
> >
> > -Would this depend on the version of condor? I am running 8.4.10 on all machines?
> >
> > -My user is known in the domain. Would I need to add this user to the local users of each processing machine?
> >
> > -In the user manual in 7.2.5 "Condor_credd Daemon" a variable called "Local_credd" is mentioned. However I cannot find this variable in non of the examples. Is it necessary to specify this variable in the config file?
> >
> > - Do I need to use a pool PW? Or is it enought to use suggestion from "7.2.6 Executing Jobs with the User's Profile Loaded" and just set "load_profile = True" in submission file.
> >
> > - In usermanual 3.8.13.2 I find the following sentence: "Under Windows, HTCondor by default runs jobs under a dynamically created local account that exists for the duration of the job, but it can optionally run the job as the user account that owns the
job if STARTER_ALLOW_RUNAS_OWNER is True and the job contains RunAsOwner=True."
> > Is it RunAsOwner = true or Run_As_Owner = true?
> >
> >
> > Btw:
> > whoami is giving: calibration@xxxxxxxxxxx<mailto:calibration@xxxxxxxxxxx>.
> > This is correct. I would like to have this user running jobs in the condor environment.
> >
> >
> > Best regards,
> > Robert
> >
> >
> >
> > -----------------------
> >
> > -----------------------
> >
> >
> > > Gesendet: Donnerstag, 13. September 2018 um 22:31 Uhr
> > > Von: "Todd Tannenbaum" <tannenba@xxxxxxxxxxx<mailto:tannenba@xxxxxxxxxxx>>
> > > An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx<mailto:htcondor-users@xxxxxxxxxxx>>,
rb <robertbosch@xxxxxx<mailto:robertbosch@xxxxxx>>
> > > Betreff: Re: [HTCondor-users] Cannot sent jobs as Owner in WindowsOS
> > >
> > > On 9/12/2018 5:02 AM, rb wrote:
> > > > I would like to send and process the job as "owner".
> > > > Not the default "condor-slot user" is procesing the job, but actually the person who is logged on the submitter and is sending the job.
> > > >
> > > > For this we created a user "calibration*. This user is registered in our domain and has admin-permission on all machines (All win 10) connected to the pool.
> > > >
> > > > For this I edited the config file on Submitter and Executing nodes:
> > > >
> > > > [...]
> > > > FILESYSTEM_DOMAIN = lgs-net.com
> > > > UID_DOMAIN = lgs-net.com
> > > > TRUST_UID_DOMAIN = true
> > > > SOFT_UID_DOMAIN = true
> > > > STARTER_ALLOW_RUNAS_OWNER = true
> > > > [...]
> > > >
> > > >
> > > > The submission files are having in addition following entry
> > > > [...]
> > > > Run_As_Owner = true
> > > > [...]
> > > >
> > > >
> > > > I also used "condor_store_cred add" on submitter and pool to store PW for user "calibration"
> > > >
> > > > Still its not working!
> > > > Jobs are created. Also .err and .out files. But they are not picked by Scheduler. Using "condor_q": No jobs in queue.
> > > >
> > > >
> > > > Can someone give some hints?
> > > >
> > >
> > > Did you do a condor_reconfig or restart HTCondor after changing the config settings on your execute and submit hosts?
> > >
> > > Also I don't see anything in your config re your CREDD_HOST etc, as described in the Microsoft Windows chapter in the HTCondor Manual for executing jobs as the Submitting User... specifically I am looking at this section:
> > > http://htcondor.org/manual/v8.7/MicrosoftWindows.html#x75-5750008.2.4
> > > Perhaps you want to re-read and follow the configuration examples in that part of the Manual.
> > >
> > > Some additional ideas / suggestions:
> > >
> > > Are you running condor_submit as user "calibration" ? What does "whoami" report before submitting the job?
> > >
> > > Try submitting a very simple job and see if that runs as user "calibration". I would suggest running "whoami.exe" with a job event log and see what happens. For example --
> > > executable = whoami.exe
> > > output = test.out
> > > error = test.err
> > > log = test.log
> > > run_as_owner = true
> > > queue
> > >
> > > and then take a look at test.out, test.err, test.log.
> > >
> > > You say the job is successfully submitted but condor_q says no jobs in the queue... ??? what does "condor_q -allusers" say? Or is that because the job is quickly completing... what does condor_history say?
> > >
> > > Re the below observations: I am not the Windows expert, but I believe you should only need to run 'condor_store_cred add' on the submit node, which will then send the password (encrypted) and securely store it on the host running the condor_credd daemons.
The execute node will securely fetch the password as needed.
> > >
> > > Hope the above helps,
> > > Todd
> > >
> > >
> > > > I made two observations:
> > > > 1) I cannot use "condor_store_cred add" on executing machines. It returns an error "operation failed". Make sure you have WRITE permission onto this node. Although "WRITE = *" is set in all config files.
> > > > 2) By default our Software adds "load_profile = true" in all submission files. Could this be a potential problem?
> > > >
> > > >
> > > >
> > > > Best regards,
> > > > Robert
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > -----------------------
> > > >
> > > > -----------------------
> > > >
> > > > _______________________________________________
> > > > HTCondor-users mailing list
> > > > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx>
with a
> > > > subject: Unsubscribe
> > > > You can also unsubscribe by visiting
> > > > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > > >
> > > > The archives can be found at:
> > > > https://lists.cs.wisc.edu/archive/htcondor-users/
> > > >
> > >
> > >
> > > --
> > > Todd Tannenbaum <tannenba@xxxxxxxxxxx<mailto:tannenba@xxxxxxxxxxx>>
University of Wisconsin-Madison
> > > Center for High Throughput Computing Department of Computer Sciences
> > > HTCondor Technical Lead 1210 W. Dayton St. Rm #4257
> > > Phone: (608) 263-7132<tel:(608)%20263-7132> Madison, WI 53706-1685
> > >
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> >
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
>
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a subject:
Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a subject:
Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe
You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
|