Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] the infamous question mark problem
- Date: Fri, 26 Mar 2010 11:44:27 -0500
- From: Nick LeRoy <nleroy@xxxxxxxxxxx>
- Subject: Re: [Condor-users] the infamous question mark problem
Mag,
> Once over 1000 jobs hit the pool, I start to see the question marks.
> Is there some setting I can look at to fix this?
Just had a discussion here about this, and we have a number of questions..
1. What version of Condor are you running? A recent performance enhancement
could possibly be malfunctioning and causing the problems.
2. Do you know what the jobs are doing during these "events"? Is there a
pattern to them? For example, when you run your 'condor_q -run', do you
sometimes see all jobs good, and on other runs a grouping of '??????' jobs?
3. I think that it'd be helpful if you could post the following:
3a. job log snippet(s) around the window in which you've seen the problem
3b. ShadowLog snippet(s) of the same
Finally, some observations and a window into our thoughts:
1. When you run 'condor_q -run', it's equivalent to running:
condor_q -const 'JobStatus==2' -format ...
2. It's possible that there's a race condition in which the job's status
(JobStatus) has been set to RUNNING (2) without the RemoteHost attribute being
set. This should never happen, but it obviously is. The answers to the above
questions may help us to isolate how this is happening.
Thanks Mag,
-Nick
--
<<< Welcome to the real world. >>>
/`-_ Nicholas R. LeRoy The Condor Project
{ }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
\ / nleroy@xxxxxxxxxxx The University of Wisconsin
|_*_| 608-265-5761 Department of Computer Sciences