Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Parallel Jobs
- Date: Fri, 23 May 2025 16:19:38 +0200
- From: Enrico Morelli <morelli@xxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Parallel Jobs
On Fri, 23 May 2025 13:30:49 +0000
Cole Bollig via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
> Hi Enrico,
>
> What does condor_q <Job ID> -af Scheduler say? This value is the one
> that the dedicated Schedd will when searching for EP's assigned with
> the DedicatedScheduler attribute.
>
Thanks you. I solved. The problem was that the scheduler has two network interfaces. One connected to a condor.net private network and the other connected to the public network.
The CE machine has a public name registered on the DNS and when tried to submit the job as DedicatedScheduler it used the public name. By setting the variable NETWORK_HOSTNAME = ce.condor.net, everything started working.
> -Cole Bollig
> ________________________________
> From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf
> of Enrico Morelli <morelli@xxxxxxxxxxxxx> Sent: Friday, May 23, 2025
> 5:40 AM To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] Parallel Jobs
>
> Dear all,
>
> I'm trying to configure my Linux HTCondor 10.x cluster to be able to
> run parallel jobs. My scheduler server for every job (sequential or
> parallel) is ce.condor.net. Following the guide I wrote the following
> configuration in three nodes:
>
> EXECUTE = /home/condor/execute/
> Filesystem_Domain = condor.net
> Uid_Domain = condor.net
> TRUST_UID_DOMAIN = True
> ## Make a single partitionable slot
> SLOT_TYPE_1 = cpus=100%,ram=100%
> SLOT_TYPE_1_PARTITIONABLE = True
> NUM_SLOTS_TYPE_1 = 1
> DedicatedScheduler = "DedicatedScheduler@xxxxxxxxxxxxx"
> STARTD_ATTRS = $(STARTD_ATTRS), DedicatedScheduler
> START = True
> SUSPEND = False
> CONTINUE = True
> PREEMPT = False
> KILL = False
> WANT_SUSPEND = False
> WANT_VACATE = False
> RANK = Scheduler =?= $(DedicatedScheduler)
>
> If from the schedule I run the command:
>
> # condor_status -const '!isUndefined(DedicatedScheduler)' -format
> "%s\t" Machine -format "%s\n" DedicatedScheduler
>
> I've the following output:
>
> wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn1.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn2.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx
> wn3.condor.net DedicatedScheduler@xxxxxxxxxxxxx
>
>
> But submitting the following test job:
>
> universe = parallel
> executable = /bin/sleep
> arguments = 30
> machine_count = 2
> log = log
> should_transfer_files = IF_NEEDED
> when_to_transfer_output = ON_EXIT
> request_cpus = 1
> request_memory = 1024M
> request_disk = 10240K
>
> queue
>
>
> In the SchedLog I've:
>
> (D_ALWAYS:2) Trying to find 2 resource(s) for dedicated job 21084581.0
> (D_ALWAYS) Skipping job 21084581.0 because it requests more nodes (2)
> than exist in the pool (0)
>
> Where I wrong?
>
> Thanks
> --
> -----------------------------------------------------------
> Enrico Morelli
> System Administrator | Programmer | Web Developer
>
> CERM - Polo Scientifico
> via Sacconi, 6 - 50019 Sesto Fiorentino (FI) - ITALY
> ------------------------------------------------------------
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> with a subject: Unsubscribe
>
> Join us in June at Throughput Computing 25:
> https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!I1QogcW7xe2VYVVyFS96nz1EbVWcu3oyjT-g3RgZdJ8dySrznMbrgxy38_73rus8wsduXuMpzurStKrBzYAkRw$
>
> The archives can be found at:
> https://www-auth.cs.wisc.edu/lists/htcondor-users/
--
-----------------------------------------------------------
Enrico Morelli
System Administrator | Programmer | Web Developer
CERM - Polo Scientifico
via Sacconi, 6 - 50019 Sesto Fiorentino (FI) - ITALY
------------------------------------------------------------