08/18/16 19:43:03 slot1_1: Changing state and activity: Claimed/Retiring
-> Preempting/Vacating
08/18/16 19:43:03 PERMISSION DENIED to submit-side@matchsession from
host 192.168.xxx.xxx for command 403 (DEACTIVATE_CLAIM), access level
DAEMON: reason: cached result for DAEMON; see first case for the full reason
08/18/16 19:43:03 slot1_1: Got DEACTIVATE_CLAIM while in Preempting
state, ignoring.
08/18/16 19:43:03 Starter pid 6873 exited with status 0
08/18/16 19:43:03 slot1_1: State change: starter exited
08/18/16 19:43:03 slot1_1: State change: No preempting claim, returning
to owner
08/18/16 19:43:03 slot1_1: Changing state and activity:
Preempting/Vacating -> Owner/Idle
08/18/16 19:43:03 slot1_1: State change: IS_OWNER is false
08/18/16 19:43:03 slot1_1: Changing state: Owner -> Unclaimed
08/18/16 19:43:03 slot1_1: Changing state: Unclaimed -> Delete
08/18/16 19:43:03 slot1_1: Resource no longer needed, deleting
08/18/16 19:43:03 Deleting cron job manager
08/18/16 19:43:03 Cron: Killing all jobs
08/18/16 19:43:03 Cron: Killing all jobs
08/18/16 19:43:03 CronJobList: Deleting all jobs
08/18/16 19:43:03 Cron: Killing all jobs
08/18/16 19:43:03 CronJobList: Deleting all jobs
08/18/16 19:43:03 Deleting benchmark job mgr
08/18/16 19:43:03 Cron: Killing all jobs
08/18/16 19:43:03 Killing job mips
08/18/16 19:43:03 Killing job kflops
08/18/16 19:43:03 Cron: Killing all jobs
08/18/16 19:43:03 Killing job mips
08/18/16 19:43:03 Killing job kflops
08/18/16 19:43:03 CronJobList: Deleting all jobs
08/18/16 19:43:03 CronJobList: Deleting job 'mips'
08/18/16 19:43:03 CronJob: Deleting job 'mips'
(/usr/lib/condor/libexec/condor_mips), timer -1
08/18/16 19:43:03 CronJobList: Deleting job 'kflops'
08/18/16 19:43:03 CronJob: Deleting job 'kflops'
(/usr/lib/condor/libexec/condor_kflops), timer -1
08/18/16 19:43:03 Cron: Killing all jobs
08/18/16 19:43:03 CronJobList: Deleting all jobs
08/18/16 19:43:03 All resources are free, exiting.
08/18/16 19:43:03 **** condor_startd (condor_STARTD) pid 6818 EXITING
WITH STATUS 0
On Thursday 18 August 2016 19:12:55 Harald van Pee wrote:
> @Bop: I also give the command from the central manager.
>
> @Todd:
> I have no MaxJobRetirementTime defined (nothing with retire or time found
> on condor_config*, not on node, scheduler or central manager.
>
> condor_status| grep node
> slot1@node LINUX X86_64 Unclaimed Idle 0.230 63507 0+00:00:04
> slot1_1@node LINUX X86_64 Claimed Busy 0.000 1024 0+00:00:03
>
> after
> condor_off -peaceful -daemon startd node
> condor_status shows no node anymore (within 1 second, as fast as I can
> type).
>
> We use
>
> CLAIM_WORKLIFE = 120
> and
> STARTD_EXPRS = $(STARTD_EXPRS), DedicatedScheduler
>
> NUM_SLOTS = 1
> SLOT_TYPE_1 = 100%
> SLOT_TYPE_1_PARTITIONABLE = true
> NUM_SLOTS_TYPE_1 = 1
>
> Any help is welcome.
>
> Harald
>
> On Thursday 18 August 2016 18:29:04 Todd Tannenbaum wrote:
> > As another data point, it also seemed to work for me running a
> > pre-release of HTCondor v8.5.7 on Scientific Linux 6.8.
> > Behold the simple test below; note the node went from Claimed/Busy to
> > Claimed/Retiring, which is expected. "Retiring" activity is defined in
> >
> > the Manual (from https://is.gd/mi7mVk ):
> > Retiring
> >
> > When an active claim is about to be preempted for any reason, it
> > enters
> >
> > retirement, while it waits for the current job to finish. The
> > MaxJobRetirementTime expression determines how long to wait (counting
> > since the time the job started). Once the job finishes or the
retirement
> > time expires, the Preempting state is entered.
> >
> > Perhaps you have a MaxJobRetirementTime defined ?
> >
> > regards,
> > Todd
> >
> > [tannenba@localhost test]$ condor_status
> > Name OpSys Arch State Activity LoadAv Mem
> > ActvtyTime
> >
> > slot1@localhost LINUX X86_64 Claimed Busy 0.000 330
> > 0+00:00:04 slot2@localhost LINUX X86_64 Unclaimed Idle 0.000
> > 330 0+00:00:05 slot3@localhost LINUX X86_64 Unclaimed Idle
> > 0.000 330 0+00:00:06
> >
> > Total Owner Claimed Unclaimed Matched Preempting
> >
> > Backfill Drain
> >
> > X86_64/LINUX 3 0 1 2 0 0
> >
> > 0 0
> >
> > Total 3 0 1 2 0 0
> >
> > 0 0
> >
> > [tannenba@localhost test]$ condor_off -peaceful -daemon startd
> > Sent "Set-Peaceful-Shutdown" command to local startd
> > Sent "Kill-Daemon-Peacefully" command to local master
> >
> > [tannenba@localhost test]$ condor_status
> > Name OpSys Arch State Activity LoadAv Mem
> > ActvtyTime
> >
> > slot1@localhost LINUX X86_64 Claimed Retiring 0.000 330
> > 0+00:00:03 slot2@localhost LINUX X86_64 Unclaimed Idle 0.000
> > 330 0+00:02:49 slot3@localhost LINUX X86_64 Unclaimed Idle
> > 0.000 330 0+00:00:06
> >
> > Total Owner Claimed Unclaimed Matched Preempting
> >
> > Backfill Drain
> >
> > X86_64/LINUX 3 0 1 2 0 0
> >
> > 0 0
> >
> > Total 3 0 1 2 0 0
> >
> > 0 0
> >
> > On 8/18/2016 11:11 AM, Bob Ball wrote:
> > > Just as a data point, I do, from our central manager machine,
> > > condor_off -peaceful -daemon startd -name $publicName
> > > and it runs just fine. All our jobs are vanilla. HTCondor is version
> > > 8.4.6 on Scientific Linux.
> > >
> > > bob
> > >
> > > On 8/18/2016 11:54 AM, Harald van Pee wrote:
> > >> Hi,
> > >>
> > >> I want to set a job running node offline, but only after all running
> > >> jobs have finished. Of course until then no new jobs should be
> > >> accepted on that node.
> > >>
> > >> I tried the command:
> > >>
> > >> condor_off -peaceful -daemon startd node
> > >>
> > >> and got the message:
> > >>
> > >> Sent "Set-Peaceful-Shutdown" command to startd node
> > >>
> > >> Sent "Kill-Daemon-Peacefully" command to master node
> > >>
> > >> On node I see in StartLog
> > >>
> > >> 08/18/16 17:20:49 Got SIGTERM. Performing graceful shutdown.
> > >>
> > >> 08/18/16 17:20:49 shutdown graceful
> > >>
> > >> And indeed all jobs running in vannilla universe (we have no others)
> > >>
> > >> are killed directly and started from the beginning. This is what a
> > >>
> > >> graceful shutdown will do with vanilla jobs. But I want to have a
> > >> peaceful shutdown.
> > >>
> > >> Is a peaceful shutdown not possible for vanilla jobs?
> > >>
> > >> Do I have to change the configuration? We use:
> > >>
> > >> PREEMPT = FALSE
> > >>
> > >> PREEMPTION_REQUIREMENTS = False
> > >>
> > >> KILL = FALSE
> > >>
> > >> WANT_SUSPEND = false
> > >>
> > >> WANT_VACATE = false
> > >>
> > >> Or can I use just a different command?
> > >>
> > >> We use condor 8.4.8 on debian 8 (AMD64).
> > >>
> > >> Thanks
> > >>
> > >> Harald
> > >>
> > >>
> > >>
> > >> _______________________________________________
> > >> HTCondor-users mailing list
> > >> To unsubscribe, send a message tohtcondor-users-request@xxxxxxxxxxx
> > >> with a subject: Unsubscribe
> > >> You can also unsubscribe by visiting
> > >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > >>
> > >> The archives can be found at:
> > >> https://lists.cs.wisc.edu/archive/htcondor-users/
> > >
> > > _______________________________________________
> > > HTCondor-users mailing list
> > > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
> > > with a subject: Unsubscribe
> > > You can also unsubscribe by visiting
> > > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > >
> > > The archives can be found at:
> > > https://lists.cs.wisc.edu/archive/htcondor-users/
--
Harald van Pee
Helmholtz-Institut fuer Strahlen- und Kernphysik der Universitaet Bonn
Nussallee 14-16 - 53115 Bonn - Tel +49-228-732213 - Fax +49-228-732505
mail: pee@xxxxxxxxxxxxxxxxx
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/