[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Authentication Issue between HTCondorCE Schedd and Batch Schedd



Thanks you Jamie!

That indeed does seem to be the issue. Removing that from the config, restarting, chowning some files, repeat, eventually results in the condor-ce jobs making it into batch scheduler and running successfully.

Hmmm, git blames me for that line, but in m y defence it seems to be in the initial commit of the sample config files I git from another site when I was first setting up ArcCEs/Condor 13 years ago.

It just goes to show that you shouldnât just accept config from people without understanding it thoroughly.

Thanks again, just the VOMS issue to fix now, but people keep telling me that I donât need VOMS auth anyway,

Chris.

On 11/05/2026, 20:36, "Jaime Frey" <jfrey@xxxxxxxxxxx> wrote:

Hmm. It looks like your HTCondor daemons are configured to always run with effective uid 0 instead of spending most of their time as effective uid of the condor user. Are you setting CONDOR_IDS=0.0? That will break the Job Routerâs attempts to authenticate, in addition to being highly unrecommended.

 - Jaime

On May 8, 2026, at 5:50âAM, Chris Brew - STFC UKRI <chris.brew@xxxxxxxxxx> wrote:

Hi Jamie,

The batch condor has it as pp.rl.ac.uk, for the condor-ce Iâve tried both leaving it at the default users.htcondor.orgâ and setting it to the same as the batch condor, it doesnât appear to make any difference, the Job_Router always seems to get authenticated as condor@xxxxxxxxxxxâ.

Iâm just starting condor-ce with systemd and as far as I know Iâve not changed anything about how it would run, though psâ does show:

$ ps -efH | grep condor
brew      968693  823194  0 11:03 pts/0    00:00:00           grep --color=auto condor
root      967698  967468  0 10:31 pts/3    00:00:00                 less /var/log/condor-ce/JobRouterLog
root      967056       1  0 10:21 ?        00:00:00   /usr/sbin/condor_master -f
root      967096  967056  0 10:21 ?        00:00:00     condor_procd -A /var/run/condor/procd_pipe -L /var/log/condor/ProcLog -R 1000000 -S 60 -C 0
root      967097  967056  0 10:21 ?        00:00:00     condor_shared_port -p 9618
root      967098  967056  0 10:21 ?        00:00:00     condor_schedd
condor    967609       1  0 10:31 ?        00:00:00   condor_master
root      967672  967609  0 10:31 ?        00:00:00     condor_procd -A /var/run/condor-ce/procd_pipe -L /var/log/condor-ce/ProcLog -R 1000000 -S 60 -C 564
condor    967673  967609  0 10:31 ?        00:00:00     condor_shared_port
condor    967675  967609  0 10:31 ?        00:00:00     condor_collector
condor    967677  967609  0 10:31 ?        00:00:01     condor_schedd
condor    967678  967609  0 10:31 ?        00:00:02     condor_job_router

Is this going to be some EL10 weirdness?

Thanks,
Chris.

On 07/05/2026, 21:44, "Jaime Frey" <jfrey@xxxxxxxxxxx> wrote:

What is config knob UID_DOMAIN set to in the regular Condor and CE configurations?

The Job Router reads the job_queue.log file to get updates on the job status, instead of doing frequent condor_q commands, so that is normal. The errno 13 (EACCES) is not. The error implies that the Job Router wasnât started as root. The Job Router needs root access for its normal operation.

 - Jaime

On May 7, 2026, at 9:00âAM, Chris Brew - STFC UKRI via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:

Hi,

Iââve still not got anywhere with the VOMS authentication (Iâll post some more info soon), but Token auth seems to be working in that Jobs get into the condor-ce Schedd and are visible with condor_ce_qâ however they donât make it as far as the Schedd for the batch system.

I just copied the config of that from the config of the Schedd on our existing ArcCEs so itâs possibly itâs missing some necessary config for accepting Jobs from the Job_Router.

Iâve got three recurring errors. One in the /var/log/condor-ce/JobRouterLog:

05/07/26 14:44:29 Failed to commit job submission :
05/07/26 14:44:29 JobRouter failure (src="" failed to submit job

Which is matched with this one in /var/log/condor/SchedLog:

05/07/26 14:44:29 (pid:923597) (bt:ccbf:13) SetEffectiveOwner: UserRec lookup for owner condor@xxxxxxxxxxx found no match
05/07/26 14:44:29 (pid:923597) Owner condor@xxxxxxxxxxx has no JobQueueUserRec
05/07/26 14:44:29 (pid:923597) Creating pending JobQueueUserRec for owner condor@xxxxxxxxxxx
05/07/26 14:44:29 (pid:923597) Error: MakeUserRec with illegal identifiers: user=condor@xxxxxxxxxxx, os_user=condor
05/07/26 14:44:29 (pid:923597) NewCluster(): failed to create new User record for condor@xxxxxxxxxxx

And then another more frequent one every ten seconds in /var/log/condor-ce/JobRouterLog:

05/07/26 14:47:09 Failed to open /var/lib/condor/spool/job_queue.log: errno=13

Which looks to me like the JobRouter is trying to put jobs into the queue as (the illegal) user condor rather than the accounts the tokens are mapped to in the condor-ce Schedd (they show up there as the correctly mapped local user).

Does anyone have any idea where I should be looking?

Thanks,
Chris.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/