Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] ERROR: unable to update job ad!
- Date: Tue, 14 Mar 2006 18:04:08 -0600
- From: Erik Paulson <epaulson@xxxxxxxxxxx>
- Subject: Re: [Condor-users] ERROR: unable to update job ad!
On Tue, Mar 14, 2006 at 03:53:57PM -0800, Chris Tracy wrote:
> I was recently tasked with setting up a Condor pool here at SCU on
> our CentOS 4 systems. At first I did it with condor-6.6.10, ran into a
> few issues related to linux-2.6 support, but made my way around them only
> to ultimately be stymied by the following error in the StarterLog whenever
> I tried to submit the sh_loop example job:
>
> ERROR: unable to update job ad! Aborting OsProc::StartJob
>
> I could find no reference to this bug on any mailing list posting
> or any web-page about condor. So I decided to go to 6.7.17, as it was
> supposed to have better support for the 2.6 kernel. Indeed it does, and I
> was able to take out all my config workarounds. However, I still get the
> same problem when I try to submit the "sh_loop" example job.
>
> Ultimately everything points to condor_starter dying unexpectedly.
> So I set STARTER_DEBUG = D_ALL in config_config.local for the one execute
> node in the test cluster and resubmitted the job. This gives the
> following output in StarterLog:
>
<...>
> I'm at a loss as to what to do at this point. If I had the code
> I'd go look to see what was in OsProc::StartJob, but alas, I don't. Has
> anyone ever encountered this issue before?
>
Weird, I've never seen this one before.
What's the output of condor_q -l for the job that failed, and what's the
result of 'condor_config_val JOB_RENICE_INCREMENT' on the machine where the
job is executing?
-Erik