We test installing HTCondor on Ubuntu and have not seen that problem.
The installation script creates the condor account with a home directory of /var/lib/condor.
Is it possible that the condor account already exists with a different home directory. That would be one possible explanation for the failure.
Let me know. I can add some defensive coding for installations where condor's default home directory has been changed.
Hello,
I am in the process of trying to update an Ubuntu 20 8.8.x flock to 23.0.
I am not sure if this is new or well-known issue but there appears to be a problem with HTCondor install for 23.0, possibly 10.x and 10.0 too.
I haven’t tried 9.0, 9.x or 23.x. My 8.8.x are old and but I do not remember experiencing this issue when doing fresh 8.8.x installs.
As part of the testing, I tried directly upgrading from 8.8.x, but I also tried fresh installs using both the get_condor script and a manual install.
In both the later cases the previous install was stripped using the command line from the get_condor script: apt-get -y remove --purge htcondor && apt-get -y autoremove --purge
&& rm -fr /etc/condor (basically what get_condor suggests to do to remove older installs).
On Ubuntu 20, this issue seems to appear all the time when installing using get_condor (minicondor or other roles) or manually with apt-get install htcondor or apt-get install
minicondor
When installing on a condor-free system, while /etc/condor is recreated, it seems that the root of the execute directory /var/lib/condor is not recreated by the installation
process.
This issue does not occur when updating from 8.8.x of course as the /var/lib/condor directory already exists.
This path is defined in the default /etc/condor/condor_config that is deployed by the installation:
LOCAL_DIR = /var
[…]
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
This issue makes stard fail in loop on execute machines (see logs below) when starting condor, probably due to the fact that the condor user cannot create a directory in /var/lib.
The fix for this is for root to create the missing directory /var/lib/condor, startd will recreate the execute sub-directory belonging to the condor user next time condor is
restarted.
$ cd /var/lib
$ mkdir condor
$ chmod 755 condor
In /var/log/condor/MasterLog:
10/05/23 10:09:11 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 69386
10/05/23 10:09:11 Daemons::StartAllDaemons all daemons were started
10/05/23 10:09:14 The STARTD (pid 69386) exited with status 4
10/05/23 10:09:14 Sending obituary for "/usr/sbin/condor_startd"
10/05/23 10:09:14 restarting /usr/sbin/condor_startd in 10 seconds
[…] loops from here
In /var/log/condor/StartLog:
10/05/23 10:10:42 ******************************************************
10/05/23 10:10:42 ** condor_startd (CONDOR_STARTD) STARTING UP
10/05/23 10:10:42 ** /usr/sbin/condor_startd
10/05/23 10:10:42 ** SubsystemInfo: name=STARTD type=STARTD(6) class=DAEMON(1)
10/05/23 10:10:42 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON
10/05/23 10:10:42 ** $CondorVersion: 23.0.0 2023-09-29 BuildID: 678686 PackageID: 23.0.0-1 $
10/05/23 10:10:42 ** $CondorPlatform: X86_64-Ubuntu_20.04 $
10/05/23 10:10:42 ** PID = 70350
10/05/23 10:10:42 ** Log last touched 10/5 10:10:17
10/05/23 10:10:42 ******************************************************
10/05/23 10:10:42 Using config source: /etc/condor/condor_config
10/05/23 10:10:42 Using local config sources:
10/05/23 10:10:42 /etc/condor/config.d/00-htcondor-9.0.config
10/05/23 10:10:42 /etc/condor/condor_config.local
10/05/23 10:10:42 config Macros = 93, Sorted = 93, StringBytes = 2628, TablesBytes = 3404
10/05/23 10:10:42 CLASSAD_CACHING is ENABLED
10/05/23 10:10:42 Daemon Log is logging: D_ALWAYS D_ERROR D_STATUS
10/05/23 10:10:42 SharedPortEndpoint: waiting for connections to named socket startd_69340_0d0f
10/05/23 10:10:42 DaemonCore: command socket at <192.168.8.246:9618?addrs=192.168.8.246-9618&alias=suvofpcand20.corp.spc.int&noUDP&sock=startd_69340_0d0f>
10/05/23 10:10:42 DaemonCore: private command socket at <192.168.8.246:9618?addrs=192.168.8.246-9618&alias=suvofpcand20.corp.spc.int&noUDP&sock=startd_69340_0d0f>
10/05/23 10:10:45 VM universe will be tested to check if it is available
10/05/23 10:10:45 History file rotation is enabled.
10/05/23 10:10:45 Maximum history file size is: 20971520 bytes
10/05/23 10:10:45 Number of rotated history files is: 2
10/05/23 10:10:45 Startd will not enforce disk limits via logical volume management.
10/05/23 10:10:45 Failed to stat /var/lib/condor/execute: (errno 2) No such file or directory
10/05/23 10:10:45 ERROR "Error accessing execute directory /var/lib/condor/execute specified in the configuration setting SLOT1_EXECUTE: (errno=2) No such file or directory"
at line 78 in file /var/lib/condor/execute/slot1/dir_3182108/userdir/build-I2xw6a/condor-23.0.0/src/condor_startd.V6/slot_builder.cpp
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/