[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to ask htcondor to wail till all jobs finished in Vanilla Universe



HI Guys,
          Please advise. Is this achievable using the Vanilla universe or will I need to switch to a parallel universe ?

Thanks,
Gagan

On Thu, Dec 29, 2022 at 12:12 PM gagan tiwari <gagan.tiwari@xxxxxxxxxxxxxxxxxx> wrote:
Hi Guys,
        I have an executeÂserver with 8 coresÂand I am trying to run MPI jobs inÂVanilla Universe on the execute server with one job on eachÂcore.
I have been able to make them start successfully on that execute server by using following attribute on execute server condor config:-Â

NUM_SLOTS = 1
NUM_SLOTS_TYPE_1 = 1
SLOT_TYPE_1 = cpus=100%
SLOT_TYPE_1_PARTITIONABLE = True

But the issue is condor doesn't wait for all jobs to finish and kills all jobs running on different cores on that single executeÂserver as soon as one of the jobs is finished.Â

I have tried usingÂÂ+ParallelShutdownPolicy = "WAIT_FOR_ALL"Â in the job submit file but that also didn't help.

Someone please help me how to fix this issue. It's a bit urgent.Â

Thanks,
Gagan