HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Java universe and memory



On Wed, Mar 5, 2008 at 1:07 PM, Craig Bruce <pcxcb1@xxxxxxxxxxxxxxxx> wrote:
> Hi,
>
>  I'm successfully using 7.0.1 on a linux pool. We run a lot of java jobs that
>  use lots of RAM. It is not unusual to underestimate the amount of RAM we
>  need to pass to the JVM:
>  java_vm_args   = -Xmx900m
>
>  If it isn't enough the JVM will not complete the task and the error file
>  confirms this:
>  java.lang.OutOfMemoryError: Java heap space
>
>  However, condor will evict this job and thus resubmit somewhere else. As the
>  memory value has not been altered the same error will result. Should the
>  task not just complete in this case? Otherwise users think the job is
>  running/waiting to rematch, but really it needs cancelling, modifying and
>  resubmitting.

check whether the resulting exit code is consistent and happens only
in this or similar events.
If not and you can alter you application use something like:
  try
  {
  }
  catch (OutOfMemoryException e)
  {
     // log it however you normally would
     System.exit(some constant number you know)
  }

the on_exit_remove or on_exit_hold can trap this and place it on hold
for you to deal with.

Matt