[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Is it possible to immediately suspend jobs of a DAGman job?



In my case, there are quite different resource requirements for jobs. For example, job A is followed by job B, A needs 8GB memory 1 slot, but B only needs 1GB 1 slot. If the resource can't be immediately reclaimed after finishing A, mostly likely 8GB 1slot will be assigned to run B. This will lead to waste of memory. Thus i want condor to reclaim resource immediately. Does it make sense to set keep_claim_idle to zero? Thanks.

hufh

On Fri, Jan 4, 2019 at 12:30 AM Greg Thain <gthain@xxxxxxxxxxx> wrote:
On 1/3/19 10:20 AM, Michael Pelletier wrote:

Â

The âkeep_claim_idleâ setting in a job submission has to do with avoiding negotiator overhead for matching jobs. Increasing it just means that the claim can be reused without having to go through returning it to the schedd and having it reassigned â the start daemon can just be directly asked to run another job.

Â


Exactly. And, by default, all jobs submitted by dagman add a keep_claim_idle = 20 to the submitted job. This allows the schedd to hold onto the claim for subsequent jobs from the user, even if they haven't been submitted just yet. In many cases, this dramatically improves the throughput of dagman. This also means that on a condor hold, the slot that was running the job will stick in Claimed/Idle for a bit.


-greg

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/