Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM

Date: Wed, 19 Dec 2007 16:09:17 +0100
From: Rob de Graaf <rob@xxxxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM

Daniel,

Thanks for the reply,

Your analysis here is wrong.  The STARTER gets SIGTERM.  It then kills
the job.

Note that the SIGQUIT comes after the job has exited.  This is part of
the normal termination of the STARTER by the STARTD after the job has
finished.  The STARTER doesn't know why the job exited, only that it did.

I see.. so regardless of the job's exit status, the starter only knowsthat a job has exited, and the startd then terminates the starter?

Why are administrators killing Condor jobs?  Note I don't say sending
them SIGQUIT because that isn't what is happening, they are killing
the jobs outside of Condor.  Why aren't they using condor_vacate or
condor_vacate_job for this purpose?  There is no way for Condor to
know why the job exited otherwise.

The problem is that the majority of our machine owners are also localadministrators for those machines, and the pool is too big and varied toinstruct everyone on condor_vacate and suspension policy settings. Sowhat happens sometimes is that a machine owner logs in and kills asuspended condor_exec process to reclaim resources.

We could default to a want_suspend = false policy, eliminating the needfor local administrators to reclaim resources, but since most jobs donot checkpoint so we'd prefer to have suspension where possible.

If we can't "catch" jobs that are being killed outside condor, I supposethe only way is to re-queue them after reviewing the logs with non-zeroreturn values?


Thanks,

Rob de Graaf

Follow-Ups:
- Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
  - From: Todd Tannenbaum

References:
- [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
  - From: rob
- Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
  - From: Daniel Forrest

Prev by Date: [Condor-users] core file from job
Next by Date: Re: [Condor-users] core file from job
Previous by thread: Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
Next by thread: Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM