@Bop: I also give the command from the central manager.
@Todd: I have no MaxJobRetirementTime defined (nothing with retire or time found on condor_config*, not on node, scheduler or central manager.
condor_status| grep node slot1@node LINUX X86_64 Unclaimed Idle 0.230 63507 0+00:00:04 slot1_1@node LINUX X86_64 Claimed Busy 0.000 1024 0+00:00:03
after condor_off -peaceful -daemon startd node condor_status shows no node anymore (within 1 second, as fast as I can type).
We use
CLAIM_WORKLIFE = 120 and STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
NUM_SLOTS = 1 SLOT_TYPE_1 = 100% SLOT_TYPE_1_PARTITIONABLE = true NUM_SLOTS_TYPE_1 = 1
Any help is welcome.
Harald
On Thursday 18 August 2016 18:29:04 Todd Tannenbaum wrote: > As another data point, it also seemed to work for me running a > pre-release of HTCondor v8.5.7 on Scientific Linux 6.8. > Behold the simple test below; note the node went from Claimed/Busy to > Claimed/Retiring, which is expected. "Retiring" activity is defined in > the Manual (from https://is.gd/mi7mVk ): > > Retiring > When an active claim is about to be preempted for any reason, it enters > retirement, while it waits for the current job to finish. The > MaxJobRetirementTime _expression_ determines how long to wait (counting > since the time the job started). Once the job finishes or the retirement > time expires, the Preempting state is entered. > > Perhaps you have a MaxJobRetirementTime defined ? > > regards, > Todd > > [tannenba@localhost test]$ condor_status > Name OpSys Arch State Activity LoadAv Mem > ActvtyTime > > slot1@localhost LINUX X86_64 Claimed Busy 0.000 330 > 0+00:00:04 slot2@localhost LINUX X86_64 Unclaimed Idle 0.000 > 330 0+00:00:05 slot3@localhost LINUX X86_64 Unclaimed Idle > 0.000 330 0+00:00:06 > > Total Owner Claimed Unclaimed Matched Preempting > Backfill Drain > > X86_64/LINUX 3 0 1 2 0 0 > 0 0 > > Total 3 0 1 2 0 0 > 0 0 > > [tannenba@localhost test]$ condor_off -peaceful -daemon startd > Sent "Set-Peaceful-Shutdown" command to local startd > Sent "Kill-Daemon-Peacefully" command to local master > > [tannenba@localhost test]$ condor_status > Name OpSys Arch State Activity LoadAv Mem > ActvtyTime > > slot1@localhost LINUX X86_64 Claimed Retiring 0.000 330 > 0+00:00:03 slot2@localhost LINUX X86_64 Unclaimed Idle 0.000 > 330 0+00:02:49 slot3@localhost LINUX X86_64 Unclaimed Idle > 0.000 330 0+00:00:06 > > Total Owner Claimed Unclaimed Matched Preempting > Backfill Drain > > X86_64/LINUX 3 0 1 2 0 0 > 0 0 > > Total 3 0 1 2 0 0 > 0 0 > > On 8/18/2016 11:11 AM, Bob Ball wrote: > > Just as a data point, I do, from our central manager machine, > > condor_off -peaceful -daemon startd -name $publicName > > and it runs just fine. All our jobs are vanilla. HTCondor is version > > 8.4.6 on Scientific Linux. > > > > bob > > > > On 8/18/2016 11:54 AM, Harald van Pee wrote: > >> Hi, > >> > >> I want to set a job running node offline, but only after all running > >> jobs have finished. Of course until then no new jobs should be > >> accepted on that node. > >> > >> I tried the command: > >> > >> condor_off -peaceful -daemon startd node > >> > >> and got the message: > >> > >> Sent "Set-Peaceful-Shutdown" command to startd node > >> > >> Sent "Kill-Daemon-Peacefully" command to master node > >> > >> On node I see in StartLog > >> > >> 08/18/16 17:20:49 Got SIGTERM. Performing graceful shutdown. > >> > >> 08/18/16 17:20:49 shutdown graceful > >> > >> And indeed all jobs running in vannilla universe (we have no others) > >> > >> are killed directly and started from the beginning. This is what a > >> > >> graceful shutdown will do with vanilla jobs. But I want to have a > >> peaceful shutdown. > >> > >> Is a peaceful shutdown not possible for vanilla jobs? > >> > >> Do I have to change the configuration? We use: > >> > >> PREEMPT = FALSE > >> > >> PREEMPTION_REQUIREMENTS = False > >> > >> KILL = FALSE > >> > >> WANT_SUSPEND = false > >> > >> WANT_VACATE = false > >> > >> Or can I use just a different command? > >> > >> We use condor 8.4.8 on debian 8 (AMD64). > >> > >> Thanks > >> > >> Harald > >> > >> > >> > >> _______________________________________________ > >> HTCondor-users mailing list > >> To unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxx > >> with a subject: Unsubscribe > >> You can also unsubscribe by visiting > >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > >> > >> The archives can be found at: > >> https://lists.cs.wisc.edu/archive/htcondor-users/ > > > > _______________________________________________ > > HTCondor-users mailing list > > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with > > a subject: Unsubscribe > > You can also unsubscribe by visiting > > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > > > The archives can be found at: > > https://lists.cs.wisc.edu/archive/htcondor-users/
-- Harald van Pee
Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn Nussallee 14-16 - 53115 Bonn - Tel +49-228-732213 - Fax +49-228-732505 mail: pee@xxxxxxxxxxxxxxxxx
|