Hi Chris,
âThanks for the additional details.
âLooking at your update, before we dive deeper into the mutual authentication/FS auth issue, we should probably check the OS-level crypto policies on EL10 first.
âAs you noted, the TLS handshake is failing early and yielding an "unauthenticated" status. In EL10, the default security profiles are extremely strict and will silently reject older cryptographic algorithmsâspecifically SHA-1 signatures or shorter key lengths (which are still surprisingly common in some WLCG CA certificates, CRLs, or VOMS server responses).
âIf any certificate in the trust chain relies on SHA-1, EL10's OpenSSL will block the handshake entirely before HTCondor can even attempt to map the DN.
âTo verify if this is the culprit, I highly recommend checking and temporarily relaxing the system-wide crypto policy to see if VOMS authentication suddenly starts working.
âYou can check and change the policy using these commands on your EL10 machine:
# 1. Check the current active policy
update-crypto-policies --show
# 2. Set the policy to LEGACY to allow older algorithms like SHA-1
sudo update-crypto-policies --set LEGACY
If switching to LEGACY resolves the "unauthenticated" error, it confirms that the issue is due to the modern OS security standards blocking your VOMS/Grid certificates.
âPlease let us know how this test goes!
âBest regards,
ëëìë : Chris Brew - STFC UKRI <chris.brew@xxxxxxxxxx>
ëëìë : "ëêë" <geonmo@xxxxxxxxxxx>, "htcondor-users@xxxxxxxxxxx" <htcondor-users@xxxxxxxxxxx>
ëìëì : 2026-05-08 (ê) 18:43:06
ìë : Re: [HTCondor-users] Authentication Issue between HTCondorCE Schedd and Batch Schedd
Thanks for this.
On the Job Router config, weâve got the first two set and have tried settingJOB_ROUTER_SCHEDD2_POOLto both$(FULL_HOSTNAME):9618and out batch systems Central Manager as suggested by the deployment documentation, neither changes this behaviour.
And we â ve gotQUEUE_SUPER_USER_MAY_IMPERSONATE = .*In the config of our batch system Schedd, indeed when we first started we got errors about a non-super user Condor in the the SchedLog and addedcondorâ intoQUEUE_SUPER_USERSin the Schedd config. That got rid of the non-super user error in the SchedLog but left the other errors
On the VOMS mapping, it appears to be more subtle that that, as far as I can tell it isnât passing the <DN>,<VOMS ATTRIBUTES> to the mapping comparison at all, just the string â unauthenticated â which suggests something farther up the chain isn â t accepting the certificate. However, if I just match â .* â to an account the resulting job has the correct X509* ClassAds set, which suggests that at some level I â ve got the /etc/grid-security VOMS information set correctly (it â s set by puppet and identical to all our other grid systems).
It may be we don â t need VOMS authentication, but a reasonable fraction of the Jobs to our existing ArcCEs use VOMS authentication and since it shouldâ just work I wasn â t going to make them transition.
Is anyone else running this on EL10? Am I being too ambitious trying to go there?
We â ve tried adding D_SECURITY to SCHEDD_DEBUG on the condor-ce but that didn â t seem to log anything about the certificate decoding.
On the bath system Schedd it show the Job_Router authenticating but no sign of it attempting to impersonate, just the failure to create the user record for the illegal condor user. The Job_Router is using FS to authenticate with the batch Schedd but I donât think that should make any difference.
While I donât think itâs the case of the failures the:
Failed to open /var/lib/condor/spool/job_queue.log: errno=13
Error, does seem to point to some sort of mismatch between the condorâce and condor batch setup, the batch setup has that file root: root and mode 600, and as far as I can tell condor-ce is trying to read it ascondorâ. Setting the mode to 644 gets rid of the error but changes nothing else. SELINUX is disabled.
So Iâm a bit stuck at the moment.
Thanks,Chris.
On 08/05/2026, 01:49, ""ëêë"" <geonmo@xxxxxxxxxxx> wrote:
Hi Chris,
To address the identity mismatch and Job Router errors you're seeing, I would suggest verifying the following configuration points.
1. Job Router and Permissions
First, please ensure that the Job Router on your HTCondor-CE is correctly pointing to the local Batch Schedd and has the necessary permissions to hand off jobs. Check if these are explicitly set in your
condor-ce/conf.d:# HTCondor-CE side JOB_ROUTER_SCHEDD2_SPOOL = /var/lib/condor/spool JOB_ROUTER_SCHEDD2_NAME = $(FULL_HOSTNAME) JOB_ROUTER_SCHEDD2_POOL = $(FULL_HOSTNAME):9618And on the Local Batch (HTCondor) side, ensure the Schedd allows the CE's routing process to impersonate the job owner:
# Local Batch side (required for mapping [condor@<domain> -> local user(like sgmcms55/this account should be existed on CE, LRMS and WN)])
QUEUE_SUPER_USER_MAY_IMPERSONATE = .*2. SSL/VOMS Mapping (HTCondor v23 vs. v24)
Regarding the VOMS authentication issues, it is important to note that the way HTCondor handles SSL/Certificate mapping changed significantly between v23 and v24+.
In newer versions, the default behavior for mapping X.509 certificates has become more strict.
It often requires comparing not just the DN, but also additional attributes like VOMS roles.
This change often requires adding commas or specific formatting in your mapfiles that wasn't necessary before.
You can find the detailed requirements and the new mapping logic in this EGI documentation: HTCondor and SSL authentication
Please check if your certificate DNs and roles are being mapped correctly under the new version's rules.
3. Authentication Method
Lastly, out of curiosity, is there a specific reason your site is prioritizing VOMS/SSL over SCITOKENS for this setup?
Since many grid infrastructures are migrating toward tokens, knowing your requirements might help us suggest a more streamlined authentication path.
Hope this helps!
Best regards,
-- Geonmo
ââââââ ìë ëì ââââââëëìë : Chris Brew - STFC UKRI via HTCondor-users <htcondor-users@xxxxxxxxxxx>
ëëìë : HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
ìì : Chris Brew - STFC UKRI <chris.brew@xxxxxxxxxx>
ëìëì : 2026-05-07 (ë) 23:07:29
ìë : [HTCondor-users] Authentication Issue between HTCondorCE Schedd and Batch Schedd
Hi,
Iââve still not got anywhere with the VOMS authentication (Iâll post some more info soon), but Token auth seems to be working in that Jobs get into the condor-ce Schedd and are visible withcondor_ce_qhowever they donât make it as far as the Schedd for the batch system.
I just copied the config of that from the config of the Schedd on our existing ArcCEs so itâs possibly itâs missing some necessary config for accepting Jobs from the Job_Router.
Iâve got three recurring errors. One in the /var/log/condor-ce/JobRouterLog:
05/07/26 14:44:29 Failed to commit job submission :
05/07/26 14:44:29 JobRouter failure (src="" failed to submit job
Which is matched with this one in /var/log/condor/SchedLog:
05/07/26 14:44:29 (pid:923597) (bt:ccbf:13) SetEffectiveOwner: UserRec lookup for owner condor@xxxxxxxxxxx found no match05/07/26 14:44:29 (pid:923597) Owner condor@xxxxxxxxxxx has no JobQueueUserRec05/07/26 14:44:29 (pid:923597) Creating pending JobQueueUserRec for owner condor@xxxxxxxxxxx05/07/26 14:44:29 (pid:923597) Error: MakeUserRec with illegal identifiers: user=condor@xxxxxxxxxxx, os_user=condor05/07/26 14:44:29 (pid:923597) NewCluster(): failed to create new User record for condor@xxxxxxxxxxx
And then another more frequent one every ten seconds in /var/log/condor-ce/JobRouterLog:
05/07/26 14:47:09 Failed to open /var/lib/condor/spool/job_queue.log: errno=13
Which looks to me like the JobRouter is trying to put jobs into the queue as (the illegal) user condor rather than the accounts the tokens are mapped to in the condor-ce Schedd (they show up there as the correctly mapped local user).
Does anyone have any idea where I should be looking?
Thanks,Chris.