[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] schedd error: cannot allocate a cluster id



The reason is at the Bottom of the log snipped.

09/16/24 14:01:31 (pid:1560370) Error: MakeUserRec with illegal identifiers: user=condor@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, owner=condor, ntdomain=(null)
09/16/24 14:01:31 (pid:1560370) NewCluster(): failed to create new User record for condor@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

You cannot submit a job as the condor user.   We used to allow this, but that was a bug.  Submitting a job as condor is equivalent to submitting a job as root - it gives the job the power to run arbitrary code at high priv.

We need to figure out a way to make the error message better. Tokens make that hard because only the AP knows the real reason and it doesn't have a way to tell condor submit what the reason is.


-tj

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Ben Tovar via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Monday, September 16, 2024 1:11 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Ben Tovar <btovar@xxxxxx>
Subject: [HTCondor-users] schedd error: cannot allocate a cluster id
 
Hi all,

We are trying to install a new condor pool from scratch using get_htcondor. Trying with two machines using --central-manager and --submit respectively, when we try to submit a job with condor_q, we get:

Submitting job(s)
ERROR: Cannot allocate a cluster id

The idtokens seem ok (with condor_token_list), and the condor config files have the default use statements for role:get_htcondor_central_manager and role:get_htcondor_submit respectively. I tried adding a UID_DOMAIN line to this default configuration, but it didn't seem to have any effect.

Any help is appreciated,

Ben

condor_version
$CondorVersion: 23.9.6 2024-08-08 BuildID: 748275 PackageID: 23.9.6-1 GitSHA: dfdd9eaa $
$CondorPlatform: x86_64_AlmaLinux8 $

Possible relevant lines from ScheddLog to follow:

09/16/24 14:01:29 (pid:1560370) Received a superuser command
09/16/24 14:01:29 (pid:1560370) Number of Active Workers 0
09/16/24 14:01:31 (pid:1560370) Received a superuser command
09/16/24 14:01:31 (pid:1560370) Owner condor@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx has no JobQueueUserRec
09/16/24 14:01:31 (pid:1560370) (bt:cbe7:13) Creating pending JobQueueUserRec for owner condor@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
        Backtrace bt:cbe7:13 is
        condor_schedd(_ZN9Scheduler16insert_ownerinfoEPKc+0x149) [0x55eb3edba509]
        condor_schedd(_Z10NewClusterP11CondorError+0x3fb) [0x55eb3ed5dbcb]
        condor_schedd(_Z12do_Q_requestR9QmgmtPeer+0x2913) [0x55eb3ed85413]
        condor_schedd(_Z8handle_qiP6Stream+0x8c) [0x55eb3ed5333c]
        /lib64/libcondor_utils_23_9_6.so(_ZN10DaemonCore18CallCommandHandlerEiP6Streambbff+0x298) [0x7f24984330c8]
        /lib64/libcondor_utils_23_9_6.so(_ZN10DaemonCore21HandleReqPayloadReadyEP6Stream+0x11c) [0x7f249843343c]
        /lib64/libcondor_utils_23_9_6.so(_ZN10DaemonCore24CallSocketHandler_workerEibP6Stream+0x1e0) [0x7f249842a500]
        /lib64/libcondor_utils_23_9_6.so(_ZN10DaemonCore35CallSocketHandler_worker_demarshallEPv+0x21) [0x7f249842a741]
        /lib64/libcondor_utils_23_9_6.so(_ZN13CondorThreads8pool_addEPFvPvES0_PiPKc+0x3c) [0x7f24981e3bfc]
        /lib64/libcondor_utils_23_9_6.so(_ZN10DaemonCore6DriverEv+0xdce) [0x7f249842e4ce]
        /lib64/libcondor_utils_23_9_6.so(_Z7dc_mainiPPc+0x17ff) [0x7f249844cb3f]
        /lib64/libc.so.6(__libc_start_main+0xe5) [0x7f24960e87e5]
        condor_schedd(_start+0x2e) [0x55eb3ed1fdfe]
09/16/24 14:01:31 (pid:1560370) Error: MakeUserRec with illegal identifiers: user=condor@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx, owner=condor, ntdomain=(null)
09/16/24 14:01:31 (pid:1560370) NewCluster(): failed to create new User record for condor@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx