Hi Steffen, I guess RESERVED_DISK = 131072might be the culprit. I just checked in the documentation and the ad is in MB, i.e., the reservation for non-condor stuff of ~131GB would surpass the available 124G on your /var (unfortunately, prefixes/sizes are sometimes a bit inconsistent)
Another thing I noticed - is the execute directory on a dedicated volume? Else
STARTD_RECOMPUTE_DISK_FREE = falsemight be a problem in cases, where /var get filled by other processes (like logs) and the available disk space shrinks for jobs as well.
Cheers, Thomas On 21/04/2023 10.30, Steffen Grunewald wrote:
Good morning,
after setting up HTCondor 10.0.3 on our local cluster, I'm running into
issues related to disk space and requirements.
root@h0402:~# condor_config_val -dump -expand EXECUTE
# Configuration from machine: h0402.hypatia.local
# Parameters with names that match EXECUTE:
ENCRYPT_EXECUTE_DIRECTORY = false
ENCRYPT_EXECUTE_DIRECTORY_FILENAMES = false
EXECUTE = /var/lib/condor/execute
GANGLIAD_PER_EXECUTE_NODE_METRICS = true
LOCAL_UNIV_EXECUTE = /var/lib/condor/spool/local_univ_execute
# Contributing configuration file(s):
# /etc/condor/condor_config
# /etc/condor/condor_config_local|
root@h0402:~# df -h /var/lib/condor/execute
Filesystem Size Used Avail Use% Mounted on
/dev/sda5 125G 455M 124G 1% /var
root@h0402:~# condor_config_val -dump -expand DISK
# Configuration from machine: h0402.hypatia.local
# Parameters with names that match DISK:
CONSUMPTION_DISK = quantize(target.RequestDisk,{1024})
CREATE_LOCKS_ON_LOCAL_DISK = true
FILE_TRANSFER_DISK_LOAD_THROTTLE = 2.0
FILE_TRANSFER_DISK_LOAD_THROTTLE_LONG_HORIZON = 5m
FILE_TRANSFER_DISK_LOAD_THROTTLE_SHORT_HORIZON = 1m
FILE_TRANSFER_DISK_LOAD_THROTTLE_WAIT_BETWEEN_INCREMENTS = 60
JOB_DEFAULT_REQUESTDISK = 131072
LOCAL_DISK_LOCK_DIR =
MODIFY_REQUEST_EXPR_REQUESTDISK = quantize(RequestDisk,{1024})
RESERVED_DISK = 131072
SCHEDD_ROUND_ATTR_DiskUsage = 25%
STARTD_RECOMPUTE_DISK_FREE = false
# Contributing configuration file(s):
# /etc/condor/condor_config
# /etc/condor/condor_config_local|
root@h0402:~# condor_status -l `hostname`| grep ^Disk
Disk = 0
Since $(JOB_DEFAULT_REQUEST_DISK) > $(DISK) there's no way to run vanilla
universe jobs.
The manual, under DISK and RESERVED_DISK, suggests that the startd would
determine the amount of available space (of which there's plenty), but
for me obviously it doesn't. Is there a means to find out why?
Thanks, Steffen
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature