[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] HTCondor vs diskless exec nodes



Hi all,

 

Setup

---------------------------------------------

Rocky Linux 8.4, HTCondor 9.0.x

Exec nodes are netboot diskless, provisioned using xCAT.

Once exec nodes are live, the virtual disk size is 50% of unreserved memory. (ex: if 96GB RAM then rootfs is 48GB)

Right now, the used disk space when idle is 3.2GB, while the used memory is 650MB.

Exec nodes are configured to have dynamic & partitionable slots, set to auto for everything (CPU, Memory, Disk).

The exec node spool is local to the node (rootfs).

 

Result of condor_status:

Name           OpSys      Arch   State     Activity LoadAv Mem    ActvtyTime

 

slot1@compute1 LINUX      X86_64 Unclaimed Idle      0.000 96610  0+18:49:39

slot1@compute2 LINUX      X86_64 Unclaimed Idle      0.000 96610  0+18:19:41

 

 

Situation

---------------------------------------------

We are sometimes running into exec nodes being out of disk space for various reasons :

  • Something went wrong with a job and it kept creating more data in the spool dir
  • One or many jobs crashed over time and the spool wasn’t purged of those jobs
  • User didn’t ask for enough memory within the .sub file
  • Various system bugs creating oversized log files
  • Sh*t happens

 

 

Problem

---------------------------------------------

Even when the virtual disk is full, the exec node is still accepting jobs, and jobs keep crashing.

 

 

Question

---------------------------------------------

Other then the obvious fixing of all the causes surrounding the resulted problem, I’m wondering if there’s a way to configure condor on the exec nodes to dynamically better handle the memory of diskless setups.

Right now, condor seems to handle the available memory as if the virtual disk doesn’t even exist, probably because the system/OS does the same.

Available memory should at least be at 92GB instead of 96GB while idle.

 

Does anyone have ideas or suggestions on how to better handle diskless exec nodes?

 

I was thinking of maybe creating a STARTD_CRON job to periodically manipulate the total available memory published by the exec node for the schedd to better evaluate, but I’m not sure how to do it the right way and if that’s even a good idea when using dynamic partitionable slots and jobs are running…

Or would it be possible to do within the OS itself…?

 

Thanks!

 

Martin