[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Parallel Jobs



Hi Enrico,

What does condor_q <Job ID> -af Scheduler say? This value is the one that the dedicated Schedd will when searching for EP's assigned with the DedicatedScheduler attribute. 

-Cole Bollig

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Enrico Morelli <morelli@xxxxxxxxxxxxx>
Sent: Friday, May 23, 2025 5:40 AM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Parallel Jobs
 
Dear all,

I'm trying to configure my Linux HTCondor 10.x cluster to be able to run parallel jobs.
My scheduler server for every job (sequential or parallel) is ce.condor.net. Following the guide I wrote the following configuration
in three nodes:

EXECUTE = /home/condor/execute/
Filesystem_Domain = condor.net
Uid_Domain = condor.net
TRUST_UID_DOMAIN = True
## Make a single partitionable slot
SLOT_TYPE_1 = cpus=100%,ram=100%
SLOT_TYPE_1_PARTITIONABLE = True
NUM_SLOTS_TYPE_1 = 1
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxx"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler
START = True
SUSPEND   = False
CONTINUE  = True
PREEMPT   = False
KILL      = False
WANT_SUSPEND   = False
WANT_VACATE    = False
RANK      = Scheduler =?= $(DedicatedScheduler)

If from the schedule I run the command:

# condor_status -const '!isUndefined(DedicatedScheduler)' -format "%s\t" Machine -format "%s\n" DedicatedScheduler

I've the following output:

wn1.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net  DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net  DedicatedScheduler@xxxxxxxxxxxxx


But submitting the following test job:

universe = parallel
executable = /bin/sleep
arguments = 30
machine_count = 2
log = log
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
request_cpus   = 1
request_memory = 1024M
request_disk   = 10240K

queue


In the SchedLog I've:

(D_ALWAYS:2) Trying to find 2 resource(s) for dedicated job 21084581.0
(D_ALWAYS) Skipping job 21084581.0 because it requests more nodes (2) than exist in the pool (0)

Where I wrong?

Thanks
--
-----------------------------------------------------------
  Enrico Morelli
  System Administrator | Programmer | Web Developer

  CERM - Polo Scientifico
  via Sacconi, 6 - 50019 Sesto Fiorentino (FI) - ITALY
------------------------------------------------------------
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

Join us in June at Throughput Computing 25: https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!I1QogcW7xe2VYVVyFS96nz1EbVWcu3oyjT-g3RgZdJ8dySrznMbrgxy38_73rus8wsduXuMpzurStKrBzYAkRw$

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/