Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
- Date: Tue, 18 Dec 2007 16:27:17 +0100 (CET)
- From: rob@xxxxxxxxxxxxxxxxxx
- Subject: [Condor-users] administrator SIGQUIT vs condor_vacate SIGTERM
Hello,
On Windows, when a local user evicts a job from their system using
condor_vacate, the job gets a SIGTERM, shuts down gracefully, and is
re-queued:
12/18 13:54:20 Create_Process succeeded, pid=2332
12/18 13:56:28 Got SIGTERM. Performing graceful shutdown.
12/18 13:56:28 ShutdownGraceful all jobs.
12/18 13:56:28 Process exited, pid=2332, status=-1073741510
12/18 13:56:28 Last process exited, now Starter is exiting
12/18 13:56:28 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0
In the job .log, this action is described as:
004 (036.000.000) 12/18 15:51:55 Job was evicted.
(0) Job was not checkpointed.
The job will be run anew on another client, as expected.
But when a local administrator uses the task manager to end the
condor_exec process, the job gets a SIGQUIT, shuts down quickly and is
not re-queued:
12/18 14:04:20 Create_Process succeeded, pid=2628
12/18 14:12:33 Process exited, pid=2628, status=1
12/18 14:12:33 Got SIGQUIT. Performing fast shutdown.
12/18 14:12:33 ShutdownFast all jobs.
12/18 14:12:33 **** condor_starter (condor_STARTER) EXITING WITH STATUS 0
The job .log still shows "normal termination", as if the job had run to
completion, but with return value 1 instead of 0:
005 (037.000.000) 12/18 15:53:37 Job terminated.
(1) Normal termination (return value 1)
Condor apparently knows something is wrong, and sets exit status 1
accordingly, but doesn't reschedule, so I've now "lost" a job. What is
the reasoning behind this behavior, and how can I change it so I don't
lose jobs when administrators send them SIGQUIT?
Thanks,
Rob de Graaf