_______________________________________________A suspended job retains the claim. It uses the standard Linux process-suspend mechanism via signals. Just as a process suspended with âkill âSUSPâ stays in the process table and keeps its memory allocation, a suspended HTCondor job keeps its claim.
Â
In the past Iâve used a suspend policy on desktop workstations which run jobs when idle. Certain jobs didnât like to be simply killed, so I tried to make sure that they were evicted as infrequently as possible. I would have the job immediately suspend when the user returned so as to avoid any impact on their use of the machine, and then unsuspend when the machine went idle again, or it would vacate after a certain amount of time spent suspended if there were other matching machines available.
Â
In order to release a claim, the job has to be vacated or removed.
Â
The âkeep_claim_idleâ setting in a job submission has to do with avoiding negotiator overhead for matching jobs. Increasing it just means that the claim can be reused without having to go through returning it to the schedd and having it reassigned â the start daemon can just be directly asked to run another job.
Â
Michael V. Pelletier
Information Technology
Digital Transformation & Innovation
Integrated Defense Systems
Raytheon CompanyÂ
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of hufh
Sent: Thursday, January 3, 2019 12:45 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [External] Re: [HTCondor-users] Is it possible to immediately suspend jobs of a DAGman job?Â
Mark,
Â
Thanks for your reply! It works now. But looks like the slot is still claimed, not released, is that expected? or we need to set some configs like "keep_claim_idle" to release it?
Â
hufh
Â
On Thu, Jan 3, 2019 at 2:22 AM Mark Coatsworth <coatsworth@xxxxxxxxxxx> wrote:
Hello,
Â
The behavior you're seeing is as expected. Running condor_hold on a running DAGMan will only hold DAGMan itself, not any jobs running under it.
Â
If you want to suspend the jobs running under DAGMan, you have to do this manually:
Â
condor_hold <DAGManJobId>
condor_hold -constraint "DAGManJobId == <DAGManJobId>"
Â
Later, to release them all again:
Â
condor_release <DAGManJobId>
condor_release -constraint "DAGManJobId == <DAGManJobId>"
Â
Hope this helps,
Â
Mark
Â
Â
Â
On Wed, Jan 2, 2019 at 9:35 AM hufh <hufh2004@xxxxxxxxx> wrote:
Hi all,
Â
I am using DAGMan to run jobs, and want to suspend it, but i only found that condor_hold can't immediately stop running jobs until next ones. I have tried condor_suspend, but looks like it doesn't work for DAGman jobs, could you tell me if a DAGman jobs can be immediately suspended? Thanks a lot!
Â
hufh
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
Â
--
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison
+1 608 206 4703
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/