Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] SharedPortServer: server was busy

Date: Mon, 15 Feb 2016 14:40:38 -0600
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] SharedPortServer: server was busy

On 2/15/2016 8:45 AM, Vladimir Brik wrote:

Hello.

SharedPortLog file on our central manager has a lot of entries like:

SharedPortServer: server was busy, failed to connect to collector as
requested by <172.16.223.61:40500>: Resource temporarily unavailable
(err=11)

Sometimes, I see hundreds of such messages generated per second every
few minutes.

Is the problem that the collector doesn't respond quickly enough, or
that shared_port can't handle the volume of connections, or something else?

It is the first case you mention - the problem is that the shared_porttried to forward the connection to the collector, but the collector'slisten queue is full because the collector is not responsive enough.

Are there any configuration tweaks I could try to alleviate this?

What version of HTCondor are you running (always a good idea to let usknow...) ?

A while back we did fix a bug where the collector would periodicallypause when it was configured to use shared_port. I think this wasultimately fixed in v8.4.4+ in stable series or v8.5.2+ in developer. Ifthis is the problem, then simply upgrading should fix it, or (if youcannot upgrade for some reason) turning off shared port viaUSE_SHARED_PORT=False. This would be my first guess, esp if yourcollector seemed to be doing just fine before you started using it inconjunction with shared_port.

But another possibility is your collector is simply overloaded. Somepossible problems with pithy solutions -

Q: Do you use strong authentication (SSL, GSI, etc) to your collector,esp if you have execute nodes spread out over wide-area connections(i.e. high latency networks) ? A: Consider horizontally scaling thecollector as described here:

  https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToConfigCollectors

Q: Do you have a lot (thousands) of slots behind private networks andthus need to use CCB? A: Consider running additional instances of thecondor_collector just to handle CCB requests, separate from your centralmanager collector

Q: Do you have a lot of users or monitoring scripts constantly runningcondor_status ? A: Consider increasing COLLECTOR_QUERY_WORKERS settingin your central manager condor_config to gain increased collector queryperformance at the cost of greater memory usage.


Hope the above helps,
Todd

References:
- [HTCondor-users] SharedPortServer: server was busy
  - From: Vladimir Brik

Prev by Date: Re: [HTCondor-users] priority and rank-based preemption
Next by Date: Re: [HTCondor-users] is having domain name required?
Previous by thread: [HTCondor-users] SharedPortServer: server was busy
Next by thread: [HTCondor-users] priority and rank-based preemption
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] SharedPortServer: server was busy