[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] the infamous question mark problem



Mag,

> Once over 1000 jobs hit the pool, I start to see the question marks.
> Is there some setting I can look at to fix this?

Just had a discussion here about this, and we have a number of questions..

1. What version of Condor are you running?  A recent performance enhancement 
could possibly be malfunctioning and causing the problems.

2. Do you know what the jobs are doing during these "events"?  Is there a 
pattern to them?  For example, when you run your 'condor_q -run', do you 
sometimes see all jobs good, and on other runs a grouping of '??????' jobs?

3. I think that it'd be helpful if you could post the following:
3a. job log snippet(s) around the window in which you've seen the problem
3b. ShadowLog snippet(s) of the same

Finally, some observations and a window into our thoughts:

1. When you run 'condor_q -run', it's equivalent to running:
  condor_q -const 'JobStatus==2' -format ...

2. It's possible that there's a race condition in which the job's status 
(JobStatus) has been set to RUNNING (2) without the RemoteHost attribute being 
set.  This should never happen, but it obviously is.  The answers to the above 
questions may help us to isolate how this is happening.

Thanks Mag,

-Nick

-- 
           <<< Welcome to the real world. >>>
 /`-_    Nicholas R. LeRoy               The Condor Project
{     }/ http://www.cs.wisc.edu/~nleroy  http://www.cs.wisc.edu/condor
 \    /  nleroy@xxxxxxxxxxx              The University of Wisconsin
 |_*_|   608-265-5761                    Department of Computer Sciences