[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Jobs Stuck Due to RequestDisk = undefined




Dears,

First of all:
$CondorVersion: 24.4.0 2025-02-02 BuildID: 784192 PackageID: 24.4.0-1 GitSHA: 6f17b75e $
$CondorPlatform: x86_64_AlmaLinux9 $

I am encountering an issue where several jobs are failing to match available slots, despite sufficient disk space being reported. The problem seems to be related to how RequestDisk is being evaluated in conjunction with dynamically allocated slots.
Issue Summary:
  • Jobs are specifying RequestDisk = DiskUsage, but some still appear to have RequestDisk as undefined or fail to match slots correctly.
  • The available dynamic slots report ample disk space (TotalDisk ≈ 800GB per slot).
  • However, certain jobs have high DiskUsage values (e.g., DiskUsage = 11GB), and they are failing to find a suitable match.
  • Running condor_ce_q -better-analyze shows that the disk requirements are preventing allocation.
=======================================
The Requirements _expression_ for job 9867.000 is

    (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) &&
    (TARGET.Memory >= RequestMemory) && (TARGET.HasFileTransfer)

    [0]    : TARGET.Arch == "X86_64"
    [1]    : TARGET.OpSys == "LINUX"
    [2]    : [0] && [1]
    [3]    : TARGET.Disk >= RequestDisk
    [4]    : [2] && [3]
    [5]    : TARGET.Memory >= RequestMemory
    [6]    : [4] && [5]
    [7]    : TARGET.HasFileTransfer
    [8]    : [6] && [7]

Job 9867.000 defines the following attributes:

    DiskUsage = 40
    ImageSize = 40
    RequestDisk = undefined (kb)
    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024) (mb)

The Requirements _expression_ for job 9867.000 reduces to these conditions:

        Slots
Step   Matched  Condition
----- --------- ---------
[0]           0  TARGET.Arch == "X86_64"
[1]           0  TARGET.OpSys == "LINUX"
[3]           0  TARGET.Disk >= RequestDisk
[5]           0  TARGET.Memory >= RequestMemory
[7]           0  TARGET.HasFileTransfer
=======================================
[root@ce04 ~]# condor_ce_q -l 9867 |grep Disk|more
DiskUsage = 40
DiskUsage_RAW = 39
RequestDisk = DiskUsage
Requirements = (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) &
& (TARGET.Memory >= RequestMemory) && (TARGET.HasFileTransfer)
=======================================
[root@ce04 ~]# _val -dump | grep REQUEST
e_config_val -dump | grep DiskUsage-bash: _val: command not found
[root@ce04 ~]# condor_ce_config_val -dump | grep DiskUsage
JOB_DEFAULT_REQUESTDISK = DiskUsage
SCHEDD_ROUND_ATTR_DiskUsage = 25%
SYSTEM_STARTD_JOB_ATTRS = ImageSize, ExecutableSize, JobUniverse, NiceUser, CPUsUsage, ResidentSetSize, ProportionalSetSizeKb, MemoryUsage, DiskUsage, ScratchDirFileCount
=======================================
[root@ce04 ~]#  condor_status -long | grep DiskUsage|more
DiskUsage = 236135
DiskUsage = 275046
DiskUsage = 249114
DiskUsage = 480483
DiskUsage = 377061
DiskUsage = 272785
DiskUsage = 306690
DiskUsage = 321216
DiskUsage = 294344
DiskUsage = 156775
DiskUsage = 256135
DiskUsage = 300316
DiskUsage = 306545
DiskUsage = 221228
DiskUsage = 311412

=======================================
I appreciate any guidance you can provide!
Best regards,
Eraldo