[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Java universe and memory (moved from devel to user)



On Mar 7, 2008, at 2:27 PM, Craig Bruce wrote:

the on_exit_remove or on_exit_hold can trap this and place it on hold
for you to deal with.

I can't use either of these as the job never gets as far as exiting, it just goes back to idle and will resubmit to get the same error, ad infinitum.

The exit code is 1, as an abnormal termination, so I tried this in
on_exit_hold and periodic_hold, but the first doesn't run and second runs
before the exitcode is defined.

Is there something like on_evict_hold? I couldn't find anything in the
manual.


Condor evaluates on_exit_hold/remove when the job completes and is ready to leave the queue. Since Condor leaves the job in the queue on OutOfMemory, the on_exit expressions are evaluated.

Here's how you can use periodic_hold:
periodic_hold = NumJobStarts =!= Undefined && NumJobStarts > 2

The first half of the expression is required because NumJobStarts isn't defined in the job ad until it starts running for the first time.

This will catch jobs that re-execute other reasons as well, but it will stop infinite re-execution.

Thanks and regards,
Jaime Frey
UW-Madison Condor Team