Hi Enrico,
What does condor_q <Job ID> -af Scheduler say? This value is the one that the dedicated Schedd will when searching for EP's assigned with the DedicatedScheduler attribute.
-Cole Bollig
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Enrico Morelli <morelli@xxxxxxxxxxxxx>
Sent: Friday, May 23, 2025 5:40 AM To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx> Subject: [HTCondor-users] Parallel Jobs Dear all,
I'm trying to configure my Linux HTCondor 10.x cluster to be able to run parallel jobs. My scheduler server for every job (sequential or parallel) is ce.condor.net. Following the guide I wrote the following configuration in three nodes: EXECUTE = /home/condor/execute/ Filesystem_Domain = condor.net Uid_Domain = condor.net TRUST_UID_DOMAIN = True ## Make a single partitionable slot SLOT_TYPE_1 = cpus=100%,ram=100% SLOT_TYPE_1_PARTITIONABLE = True NUM_SLOTS_TYPE_1 = 1 DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxx" STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler START = True SUSPEND = False CONTINUE = True PREEMPT = False KILL = False WANT_SUSPEND = False WANT_VACATE = False RANK = Scheduler =?= $(DedicatedScheduler) If from the schedule I run the command: # condor_status -const '!isUndefined(DedicatedScheduler)' -format "%s\t" Machine -format "%s\n" DedicatedScheduler I've the following output: wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx But submitting the following test job: universe = parallel executable = /bin/sleep arguments = 30 machine_count = 2 log = log should_transfer_files = IF_NEEDED when_to_transfer_output = ON_EXIT request_cpus = 1 request_memory = 1024M request_disk = 10240K queue In the SchedLog I've: (D_ALWAYS:2) Trying to find 2 resource(s) for dedicated job 21084581.0 (D_ALWAYS) Skipping job 21084581.0 because it requests more nodes (2) than exist in the pool (0) Where I wrong? Thanks -- ----------------------------------------------------------- Enrico Morelli System Administrator | Programmer | Web Developer CERM - Polo Scientifico via Sacconi, 6 - 50019 Sesto Fiorentino (FI) - ITALY ------------------------------------------------------------ _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe Join us in June at Throughput Computing 25: https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!I1QogcW7xe2VYVVyFS96nz1EbVWcu3oyjT-g3RgZdJ8dySrznMbrgxy38_73rus8wsduXuMpzurStKrBzYAkRw$ The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/ |