Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Parallel Jobs
- Date: Fri, 23 May 2025 12:40:16 +0200
- From: Enrico Morelli <morelli@xxxxxxxxxxxxx>
- Subject: [HTCondor-users] Parallel Jobs
Dear all,
I'm trying to configure my Linux HTCondor 10.x cluster to be able to run parallel jobs.
My scheduler server for every job (sequential or parallel) is ce.condor.net. Following the guide I wrote the following configuration
in three nodes:
EXECUTE = /home/condor/execute/
Filesystem_Domain = condor.net
Uid_Domain = condor.net
TRUST_UID_DOMAIN = True
## Make a single partitionable slot
SLOT_TYPE_1 = cpus=100%,ram=100%
SLOT_TYPE_1_PARTITIONABLE = True
NUM_SLOTS_TYPE_1 = 1
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxx"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler
START = True
SUSPEND = False
CONTINUE = True
PREEMPT = False
KILL = False
WANT_SUSPEND = False
WANT_VACATE = False
RANK = Scheduler =?= $(DedicatedScheduler)
If from the schedule I run the command:
# condor_status -const '!isUndefined(DedicatedScheduler)' -format "%s\t" Machine -format "%s\n" DedicatedScheduler
I've the following output:
wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx
But submitting the following test job:
universe = parallel
executable = /bin/sleep
arguments = 30
machine_count = 2
log = log
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
request_cpus = 1
request_memory = 1024M
request_disk = 10240K
queue
In the SchedLog I've:
(D_ALWAYS:2) Trying to find 2 resource(s) for dedicated job 21084581.0
(D_ALWAYS) Skipping job 21084581.0 because it requests more nodes (2) than exist in the pool (0)
Where I wrong?
Thanks
--
-----------------------------------------------------------
Enrico Morelli
System Administrator | Programmer | Web Developer
CERM - Polo Scientifico
via Sacconi, 6 - 50019 Sesto Fiorentino (FI) - ITALY
------------------------------------------------------------