Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] startd doesn't start
Hi Alessandra,
I've started rolling out 8.6.1 on our SL7 worker nodes (they were previously 8.6.0), and haven't encountered any problems like you've mentioned.
Regards,
Andrew.
________________________________
From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Alessandra Forti [Alessandra.Forti@xxxxxxx]
Sent: Saturday, March 18, 2017 9:53 AM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] startd doesn't start
I tried to install a 8.6.1 node on the testbed that works and Startd doesn't start, if I downgrade to 8.4.11 it does. So it seems something peculiar to 8.6.1.
Is anyone on this list using 8.6.1? I guess for now I'm going to stick to 8.4.11 but if there is an answer to solve this please let me know.
cheers
alessandra
On 17/03/2017 09:46, Alessandra Forti wrote:
I've attached the diff of the output of the condor_config_val -dump in case it can help.
On 17/03/2017 09:28, Alessandra Forti wrote:
Hi,
I'm in a bit of a pickle and can't understand what I'm doing wrong. I have two small testbeds which I should have the same configuration and one works and the other doesn't. They both are configured with puppet.
The one that doesn't work is condor-8.6.1 the one that works is condor-8.4.11.
They are both started by root, on both the UID domain is set to the same value both on the head node and the pool node (as a matter of fact startd doesn't start on the head node either), the both have the same pool_password, but there are some differences. For example the 8.6.1 condor_shared_p starts automatically while in 8.4.11 it doesn't. We don't The pool_password are created differently that's why I stuck with the one that worked on at least one testbed. I can see startd starting for few seconds and then dying or, according to the logs, getting killed
In the StartLog files I have this error
03/17/17 08:20:35 ERROR: Attempt to initialize user_priv with root privileges rejected
03/17/17 08:20:35 ERROR "Programmer Error: attempted switch to user privilege, but user ids are not initialized" at line 1500 in file
While the MasterLog I have an endless series of these messages
03/17/17 03:20:33 restarting /usr/sbin/condor_startd in 3600 seconds
03/17/17 04:20:33 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 2717119
03/17/17 04:20:34 DefaultReaper unexpectedly called on pid 2717119, status 1024.
03/17/17 04:20:34 The STARTD (pid 2717119) exited with status 4
03/17/17 04:20:34 restarting /usr/sbin/condor_startd in 3600 seconds
03/17/17 05:20:34 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 2723991
03/17/17 05:20:35 DefaultReaper unexpectedly called on pid 2723991, status 1024.
03/17/17 05:20:35 The STARTD (pid 2723991) exited with status 4
I can only find references to these errors that are pretty old or not applicable.
thanks for any help
cheers
alessandra
--
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... (Anonymous)
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... (Anonymous)
--
Respect is a rational process. \\//
Fatti non foste a viver come bruti, ma per seguir virtute e canoscenza(Dante)
For Ur-Fascism, disagreement is treason. (U. Eco)
But but but her emails... (Anonymous)