Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] criteria for non-DAG job failures?
- Date: Wed, 27 Jun 2012 10:26:38 -0500
- From: Nathan Panike <nwp@xxxxxxxxxxx>
- Subject: Re: [Condor-users] criteria for non-DAG job failures?
On Wed, Jun 27, 2012 at 11:06:21AM -0400, Vlad wrote:
> Greetings,
>
> Condor documentation provides some details for what's considered to be a job failure for DAG submissions (e.g. http://research.cs.wisc.edu/condor/manual/v7.8/2_10DAGMan_Applications.html#SECTION003105000000000000000) and that seems to cover process exit codes.
>
> What about non-DAG (cluster) jobs? I use 'notification = error' and the empirical observation (using a very new v7.8 install) is that I do get emails when jobs crash as a result of SIGBUS, etc. However, if a job returns with a non-zero error code (e.g. non-zero return from main() in C/C++) there are no emails. Is it possible to change this behavior? Could this be a matter of changing the default Condor configuration or using the appropriate submit descriptor incantation?
>
Vlad,
For pool-wide configuration, you can use the following config line:
SYSTEM_PERIODIC_HOLD = ExitBySignal =?= True || ExitCode =!= 0
You could put a similar line in your submit file for per-job
configuration:
on_exit_hold = ExitBySignal =?= True || ExitCode =!= 0
notification = Error
Nathan Panike