Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] the infamous question mark problem
- Date: Fri, 26 Mar 2010 13:21:45 -0400
- From: Mag Gam <magawake@xxxxxxxxx>
- Subject: Re: [Condor-users] the infamous question mark problem
On Fri, Mar 26, 2010 at 12:44 PM, Nick LeRoy <nleroy@xxxxxxxxxxx> wrote:
> Mag,
>
>> Once over 1000 jobs hit the pool, I start to see the question marks.
>> Is there some setting I can look at to fix this?
>
> Just had a discussion here about this, and we have a number of questions..
>
> 1. What version of Condor are you running? A recent performance enhancement
> could possibly be malfunctioning and causing the problems.
The version we are running is 7.2.4
>
> 2. Do you know what the jobs are doing during these "events"? Is there a
> pattern to them? For example, when you run your 'condor_q -run', do you
> sometimes see all jobs good, and on other runs a grouping of '??????' jobs?
These jobs are heterogeneous. Some of them are using a simple awk,
perl, R, and Octave.
>
> 3. I think that it'd be helpful if you could post the following:
> 3a. job log snippet(s) around the window in which you've seen the problem
> 3b. ShadowLog snippet(s) of the same
>
> Finally, some observations and a window into our thoughts:
>
> 1. When you run 'condor_q -run', it's equivalent to running:
> condor_q -const 'JobStatus==2' -format ...
I will try this when the problem occurs. This usually occurs when the
other department lets us use their systems for overnight simulations.
>
> 2. It's possible that there's a race condition in which the job's status
> (JobStatus) has been set to RUNNING (2) without the RemoteHost attribute being
> set. This should never happen, but it obviously is. The answers to the above
> questions may help us to isolate how this is happening.
>
> Thanks Mag,
>
> -Nick
>
> --
> <<< Welcome to the real world. >>>
> /`-_ Nicholas R. LeRoy The Condor Project
> { }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
> \ / nleroy@xxxxxxxxxxx The University of Wisconsin
> |_*_| 608-265-5761 Department of Computer Sciences
>