[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Parallel Jobs



Dear all,

I'm trying to configure my Linux HTCondor 10.x cluster to be able to run parallel jobs. 
My scheduler server for every job (sequential or parallel) is ce.condor.net. Following the guide I wrote the following configuration
in three nodes:

EXECUTE = /home/condor/execute/
Filesystem_Domain = condor.net
Uid_Domain = condor.net
TRUST_UID_DOMAIN = True
## Make a single partitionable slot
SLOT_TYPE_1 = cpus=100%,ram=100%
SLOT_TYPE_1_PARTITIONABLE = True
NUM_SLOTS_TYPE_1 = 1
DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxx"
STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler
START = True
SUSPEND   = False
CONTINUE  = True
PREEMPT   = False
KILL      = False
WANT_SUSPEND   = False
WANT_VACATE    = False
RANK      = Scheduler =?= $(DedicatedScheduler)

If from the schedule I run the command:

# condor_status -const '!isUndefined(DedicatedScheduler)' -format "%s\t" Machine -format "%s\n" DedicatedScheduler

I've the following output:

wn1.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn1.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn2.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net	DedicatedScheduler@xxxxxxxxxxxxx
wn3.condor.net	DedicatedScheduler@xxxxxxxxxxxxx


But submitting the following test job:

universe = parallel
executable = /bin/sleep
arguments = 30
machine_count = 2
log = log
should_transfer_files = IF_NEEDED
when_to_transfer_output = ON_EXIT
request_cpus   = 1
request_memory = 1024M
request_disk   = 10240K

queue


In the SchedLog I've:

(D_ALWAYS:2) Trying to find 2 resource(s) for dedicated job 21084581.0
(D_ALWAYS) Skipping job 21084581.0 because it requests more nodes (2) than exist in the pool (0)

Where I wrong?

Thanks
-- 
-----------------------------------------------------------
  Enrico Morelli
  System Administrator | Programmer | Web Developer

  CERM - Polo Scientifico
  via Sacconi, 6 - 50019 Sesto Fiorentino (FI) - ITALY
------------------------------------------------------------