Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] duplicate jobIDs in the condor_history
- Date: Wed, 24 Nov 2010 16:22:40 -0600
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [Condor-users] duplicate jobIDs in the condor_history
Ian Chesal wrote:
So the first question is:
Did you delete the $(SPOOL) directory for the scheduler or the contents
of that directory or the job_queue.log files? If so, you reset the the
cluster ID counter and that's why you've got duplicates.
If you're certain you haven't wiped the job_queue.log file for the
scheduler, is it possible you have multiple schedulers writing to the
same history file? If so: that's bad.
Or perhaps you have multiple schedds writing to the same job_queue.log
file?? That would also be really bad.
> Each scheduler should have its own
history file.
I would state a superset of the above: each schedd should have its own
private log and spool subdirectory.
In any event, i think you can reset the next job id Condor assigns by
shutting down your schedd (condor_off -schedd), and append the following
to the end of the spool/job_queue.log file:
105
103 0.0 NextClusterNum xxxxx
106
where xxx = the next job cluster id you want to be assigned. Then turn
your schedd back on (condor_on -schedd). Note I haven't tried this
formula, so buyer beware. And if you haven't fixed the underlying
problem why the job ids got reused, it may happen again...
Hope the above helps
Todd