Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [condor-users] Jobs Stop Migrating
- Date: Wed, 10 Dec 2003 11:27:16 -0600
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [condor-users] Jobs Stop Migrating
Joel Hernandez wrote:
We have two clusters, louie and duey. Users submit their jobs on the
louie cluster. When all the nodes on louie are busy, the jobs flock
to the duey cluster. This works fine for three or four hours and then
stops all together for several hours even though many runnable jobs
are still in the queue.
The jobs start flocking again after several hours or immediately after
a condor_restart is performed on louie. However, after several hours
all the jobs stop migrating again. Has anyone had this problem?
Very odd. When you say that you do a condor_restart on louie, what
daemons are running on the machine in question? Are you restarting the
schedd, or is it just the collector and negotiator?
In the schedd logs, you should see statements about the "flock level".
Can you please check what this is doing during the time when flocking is
not working?
Dan Bradley
University of Wisconsin, Condor Project
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>