Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Large number of queued jobs slows causes schedd timeout during negotiation
- Date: Wed, 2 Mar 2005 15:50:57 -0500
- From: "Ian Chesal" <ICHESAL@xxxxxxxxxx>
- Subject: [Condor-users] Large number of queued jobs slows causes schedd timeout during negotiation
I have 2500 nice'd jobs queued on this machine. During the negotiation
cycle the schedd connection is timing out:
3/2 15:37:44 Negotiating with nice-user.ichesal@xxxxxxxxxx at
<137.57.142.112:57094>
3/2 15:37:44 Calculating schedd limit with the following parameters
3/2 15:37:44 ScheddPrio = 20000000.000000
3/2 15:37:44 ScheddPrioFactor = 10000000.000000
3/2 15:37:44 scheddShare = 0.000000
3/2 15:37:44 scheddAbsShare = 0.000000
3/2 15:37:44 ScheddUsage = 0
3/2 15:37:44 scheddLimit = 500000
3/2 15:37:44 MaxscheddLimit = 500000
3/2 15:37:44 Socket to <137.57.142.112:57094> not in cache, creating one
3/2 15:37:44 NEGOTIATOR_TIMEOUT_MULTIPLIER is undefined, using default
value of 0
3/2 15:37:44 SocketCache: Found unused slot 14
3/2 15:37:44 Sending SEND_JOB_INFO/eom
3/2 15:37:44 Getting reply from schedd ...
3/2 15:38:14 condor_read(): timeout reading buffer.
3/2 15:38:14 Failed to get reply from schedd
3/2 15:38:14 Error: Ignoring schedd for this cycle
I have "NEGOTIATE_ALL_JOBS_IN_CLUSTER = False" on this schedd machine. I
thought that might help with response tite. 2500 idle jobs seems like a
pretty paltry amount to cause negotiation problems. Is there some way I
can improve the response of the schedd so this timeout doesn't occur?
Maybe by playing with NEGOTIATOR_TIMEOUT_MULTIPLIER?
Thanks.
- Ian C.