[HTCondor-users] HTCondor vs diskless exec nodes

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Hi all,

Setup

---------------------------------------------

Rocky Linux 8.4, HTCondor 9.0.x

Exec nodes are netboot diskless, provisioned using xCAT.

Once exec nodes are live, the virtual disk size is 50% of unreserved memory. (ex: if 96GB RAM then rootfs is 48GB)

Right now, the used disk space when idle is 3.2GB, while the used memory is 650MB.

Exec nodes are configured to have dynamic & partitionable slots, set to auto for everything (CPU, Memory, Disk).

The exec node spool is local to the node (rootfs).

Result of condor_status:

Name OpSys Arch State Activity LoadAv Mem ActvtyTime

slot1@compute1 LINUX X86_64 Unclaimed Idle 0.000 96610 0+18:49:39

slot1@compute2 LINUX X86_64 Unclaimed Idle 0.000 96610 0+18:19:41

Situation

---------------------------------------------

We are sometimes running into exec nodes being out of disk space for various reasons :

Something went wrong with a job and it kept creating more data in the spool dir
One or many jobs crashed over time and the spool wasn’t purged of those jobs
User didn’t ask for enough memory within the .sub file
Various system bugs creating oversized log files
Sh*t happens

Problem

---------------------------------------------

Even when the virtual disk is full, the exec node is still accepting jobs, and jobs keep crashing.

Question

---------------------------------------------

Other then the obvious fixing of all the causes surrounding the resulted problem, I’m wondering if there’s a way to configure condor on the exec nodes to dynamically better handle the memory of diskless setups.

Right now, condor seems to handle the available memory as if the virtual disk doesn’t even exist, probably because the system/OS does the same.

Available memory should at least be at 92GB instead of 96GB while idle.

Does anyone have ideas or suggestions on how to better handle diskless exec nodes?

I was thinking of maybe creating a STARTD_CRON job to periodically manipulate the total available memory published by the exec node for the schedd to better evaluate, but I’m not sure how to do it the right way and if that’s even a good idea when using dynamic partitionable slots and jobs are running…

Or would it be possible to do within the OS itself…?

Thanks!

Martin

Mailing List Archives

Authenticated access

[HTCondor-users] HTCondor vs diskless exec nodes