Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] STARTD_ENFORCE_DISK_LIMITS and .job.ad / .machine.ad?
- Date: Wed, 25 Jun 2025 23:32:25 +0000
- From: Zach McGrew <mcgrewz@xxxxxxx>
- Subject: [HTCondor-users] STARTD_ENFORCE_DISK_LIMITS and .job.ad / .machine.ad?
Hi All,
I noticed that with STARTD_ENFORCE_DISK_LIMITS enabled that the .job.ad and .machine.ad don't get created in the usual place. I couldn't find this mentioned in the documentation or code, so I figured I'd ask here.
In the StarterLog for the slot I see:
Failed to open "/var/lib/condor/execute/dir_303915/.job.ad" for to write job ad: Permission denied (errno 13)
Failed to open "/var/lib/condor/execute/dir_303915/.machine.ad" for to write machine ad: Permission denied (errno 13)
$_CONDOR_JOB_AD and $_CONDOR_MACHINE_AD are still set in the environment:
$ printenv | grep -E 'CONDOR_.*_AD'
_CONDOR_MACHINE_AD=/var/lib/condor/execute/dir_303915/.machine.ad
_CONDOR_JOB_AD=/var/lib/condor/execute/dir_303915/.job.ad
They just point to files that don't exist:
$ cat $_CONDOR_JOB_AD
cat: /var/lib/condor/execute/dir_303915/.job.ad: No such file or directory
$ cat $_CONDOR_MACHINE_AD
cat: /var/lib/condor/execute/dir_303915/.machine.ad: No such file or directory
$ ls -al $_CONDOR_SCRATCH_DIR
total 9
drwx------ 5 mcgrewz mcgrewz 1024 Jun 25 15:40 .
drwxr-xr-x 3 condor condor 4096 Jun 25 15:40 ..
-rwx------ 1 mcgrewz mcgrewz 48 Jun 25 15:40 .chirp.config
drwxr-xr-x 2 mcgrewz mcgrewz 1024 Jun 25 15:40 .condor_ssh_to_job_1
drwx------ 2 mcgrewz mcgrewz 1024 Jun 25 15:40 tmp
drwx------ 3 mcgrewz mcgrewz 1024 Jun 25 15:40 var
When I disable STARTD_ENFORCE_DISK_LIMITS (and restart condor), the two files get created as the user:group condor:condor with permissions of 644. This might be relevant because the other files in the directory are all my user:group that submitted the job, including the .chirp.config.
Extra info that might be relevant:
Tested HTCondor 24.0.7, 24.0.8, and 24.8.1. Also tried 24.8.1 with the cool new STARTER_NESTED_SCRATCH that got mentioned at Throughput Computing Week. Similar results, but it looks like it's trying to write the ad files to scratch/ and not htcondor/ where the .update.ad moved. The htcondor/ directory is condor:condor so I'm guessing it wouldn't fail if it tried to write there instead?
I'm using LVM_BACKING_FILE to create the loopback file. This feature rocks.
LVM_HIDE_MOUNT is unset. On 24.0.8 condor_config_val says the default is false, on 24.8.1 it became auto; Though this didn't seem to matter.
JOB_EXECDIR_PERMISSIONS is unset, according to condor_config_val this defaults to user, making the dir_### permissions 700.
EP is running Debian 12 (Bookworm), with the latest updates. HTCondor packages from the research.cs.wisc.edu repository.
Thanks,
-Zach