[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] ERROR: Failed to connect to local queue manager



Hi all,

Sometimes when a user submits a large number of jobs, they see this:

UserLog will be at /spare/tmp/condor.d_f8bvcx.log
Submitting job(s)
ERROR: Failed to connect to local queue manager
CEDAR:6001:Failed to connect to <10.40.31.17:9618?addrs=10.40.31.17-9618&noUDP&sock=6191_b7bb_3>

where the IP is that of the schedd host.

The SharedPortLog has lots of messages like this:

03/17/17 17:14:18 SharedPortServer: server was busy, failed to connect 6367_e09c_3140282 as requested by SCHEDD <10.40.31.17:9618?addrs=10.40.31.17-9618&noUDP&sock=6191_b7bb_3> on <10.40.31.17:6516>: primary (fa9978631a59659f8fbed31539c90cfdba79f8f118596e1b13053010c10e1cec/6367_e09c_3140282): Connection refused (111); alt (/var/lock/condor/daemon_sock/6367_e09c_3140282): Connection refused (111)

How do I figure out what's going on here?

Thanks,
Jon 

-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Alessandra Forti
Sent: Sunday, March 19, 2017 4:53 AM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] startd doesn't start

Hi Todd,

thank you. It now works fine also with 8.6.1. It has to be set to condor from the start to get all the right permissions.

cheers
alessandra

On 18/03/2017 22:40, Todd Tannenbaum wrote:
> Hi Alessandra,
>
> I successfully reproduced your problem and understand what is happening.
>
> When the HTCondor service is started as root, the HTCondor daemons 
> have the ability to run as root, but they use it very sparingly.
> HTCondor wants to run 99% of the time with an effective uid of an 
> account that is less privileged than root. This is a good thing :).
> By default, this less privileged account is the "condor" account, but 
> you can override this by setting CONDOR_IDS to specify the uid/gid to 
> use if the "condor" account does not exist etc.
>
> The issue is that you have CONDOR_IDS=0.0 set; that is an insecure 
> configuration, as tells HTCondor to use a uid of 0 (and a gid of 0) 
> whenever it wants to run as user "condor", which effectively defeats 
> the whole idea, as now HTCondor will always be running with full root 
> powers even at times it does not need it.
>
> The reason setting CONDOR_IDS=0.0 no longer works in v8.6.0 is this 
> patch which appeared in v8.5.2:
>   https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=5467
> Effectively this patch is causing your startd to abort at startup when 
> it attempts to spawn off processes to run benchmarks; the startd wants 
> to run the benchmark processes as the less privileged "condor" account 
> (as no root access is required), but CONDOR_IDS=0.0 tells the startd 
> to use the root account in place of the condor account, and that is no 
> longer permitted.
>
> I think it would help for us to add a patch that has HTCondor 
> immediately abort on startup with a clear/helpful error message if 
> CONDOR_IDS=0.0.
>
> But the fix for you is to get rid of CONDOR_IDS=0.0 and just let 
> HTCondor use the "condor" account, or if you don't have a condor 
> account, pick CONDOR_IDS with any uid/gid other than 0.  For instance, 
> the HTCondor Manual suggests using the uid/gid of the "daemon" account 
> if for some reason you do not allow a condor account; take a peek in 
> the index of the Manual at the CONDOR_IDS entry for other ideas.
>
> Hope this helps
> Todd
>
>
> On 3/18/2017 3:43 PM, Alessandra Forti wrote:
>> In the 8.4.11 version has errors too looking at it, but somehow it 
>> still works
>> 03/18/17 20:41:54 ERROR: Attempt to initialize user_priv with root 
>> privileges rejected
>> 03/18/17 20:41:54 set_user_egid() called when UserIds not inited!
>> 03/18/17 20:41:54 set_user_euid() called when UserIds not inited!
>> 03/18/17 20:41:54 Create_Process(/usr/libexec/condor/condor_kflops):
>> child failed because PRIV_USER_FINAL process was still root before
>> exec()
>>
>>
>> On 18/03/2017 20:29, Alessandra Forti wrote:
>>> Hi Greg,
>>>
>>> thanks for your reply. We don't have LDAP/NIS the users are local to 
>>> the grid cluster. Puppet creates a condor user as well. CONDOR_IDS 
>>> is set to 0.0. in both the 8.4.11 and 8.6.1 installation. I did 
>>> enable D_FULLDEBUG but I cannot find any information about whatever 
>>> user is used other than
>>>
>>> 03/18/17 20:23:47 Running as root.  Enabling specialized core dump 
>>> routines
>>> 03/18/17 20:23:47 Daemon Log is logging: D_FULLDEBUG D_ALWAYS 
>>> D_ERROR D_COMMAND
>>>
>>> and then eventually the error reported.
>>>
>>> cheers
>>> alessandra
>>>
>>> On 18/03/2017 16:11, Greg Thain wrote:
>>>> On 03/17/2017 04:28 AM, Alessandra Forti wrote:
>>>>>
>>>>> In the StartLog files I have this error
>>>>>
>>>>> 03/17/17 08:20:35 ERROR: Attempt to initialize user_priv with root 
>>>>> privileges rejected
>>>>> 03/17/17 08:20:35 ERROR "Programmer Error: attempted switch to 
>>>>> user privilege, but user ids are not initialized" at line 1500 in 
>>>>> file
>>>>
>>>> When started as root, the startd spends most of it's runtime as a 
>>>> non-root user, for security reasons.  In those places where it 
>>>> needs root, it will setuid back to root temporarily, but then 
>>>> switch back to a non-root uid.
>>>>
>>>> This error says that the startd is trying to switch effective user 
>>>> id to some non-root user, but the numeric id (or gid) of that 
>>>> non-root user is zero, which is clearly an error, and rather than 
>>>> run with improperly elevated privileges, the startd aborts.
>>>>
>>>> So, it would be useful to know which user it is trying to run as, 
>>>> which a bit more of the log above this, especially running the 
>>>> startd with D_FULLDEBUG will show.  Also, the values of the config 
>>>> setting CONDOR_IDS may be involved.  This error can be caused if 
>>>> you are getting your passwd file entries from NIS or LDAP, and 
>>>> somehow the startd didn't load that library or configuration.
>>>>
>>>> -greg
>>>>
>>>>
>>>> _______________________________________________
>>>> HTCondor-users mailing list
>>>> To unsubscribe, send a message to
>>>> htcondor-users-request@xxxxxxxxxxx with a
>>>> subject: Unsubscribe
>>>> You can also unsubscribe by visiting 
>>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>>
>>>> The archives can be found at:
>>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>>
>>> --
>>> Respect is a rational process. \\//
>>> Fatti non foste a viver come bruti, ma per seguir virtute e
>>> canoscenza(Dante)
>>> For Ur-Fascism, disagreement is treason. (U. Eco) But but but her 
>>> emails... (Anonymous)
>>>
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>>> with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting 
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>> --
>> Respect is a rational process. \\//
>> Fatti non foste a viver come bruti, ma per seguir virtute e
>> canoscenza(Dante)
>> For Ur-Fascism, disagreement is treason. (U. Eco) But but but her 
>> emails... (Anonymous)
>>
>>
>>
>> _______________________________________________
>> HTCondor-users mailing list
>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx
>> with a
>> subject: Unsubscribe
>> You can also unsubscribe by visiting
>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>
>> The archives can be found at:
>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>
>

--
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante) For Ur-Fascism, disagreement is treason. (U. Eco) But but but her emails... (Anonymous)

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/