Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] the infamous question mark problem
- Date: Wed, 17 Mar 2010 23:43:28 -0400
- From: Mag Gam <magawake@xxxxxxxxx>
- Subject: Re: [Condor-users] the infamous question mark problem
Minor storage problem = NFS hiccup.
Well, the most recent jobs show the [?????????????????????????????] .
I am going to restart my collector to see if this fixes the problem.
condor_vacate <jid> gives me:
Can't find address for startd <jobid>
On Wed, Mar 17, 2010 at 11:31 PM, Nick LeRoy <nleroy@xxxxxxxxxxx> wrote:
> On Wednesday 17 March 2010, Mag Gam wrote:
>> last week we had a minor storage problem in our pool. From then on, we
>> see a lot of '???????' for running host field when we do condor_q -run
>> -direct schedd
>>
>> Is there a way to fix this? I see some jobs which it shows the proper
>> hostname but I see a lot of '???????' is there a way to free up our
>> condor pool?
>
> Mag,
>
> I assume that you know this already, but '???????' is what condor_q displays
> for ClassAd attributes that aren't in the ClassAd. In your case, I'd *guess*
> that the job got evicted from the machine for some reason (without
> understanding your pool layout, it's difficult to speculate what a "minor
> storage problem" could cause), but are still in the "run" state... This
> makes no sense and AFIK should never happen, but it nonetheless seems to be
> the case.
>
> I think that you'll have to force the jobs to rematch to a new machine.
> Perhaps 'condor_vacate_job' could be used to accomplish this?
>
> Hope this helps
>
> -Nick
>
> --
> <<< The matrix has you. >>>
> /`-_ Nicholas R. LeRoy The Condor Project
> { }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
> \ / nleroy@xxxxxxxxxxx The University of Wisconsin
> |_*_| 608-265-5761 Department of Computer Sciences
>