[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] How is multicore supposed to work in HTCondor? How to get started



Hi,

So we have frequent requests for multicore or even whole-node jobs.  Hereâs an example:

The Requirements _expression_ for job 27504.000 is

    (TARGET.Machine == "wn-sate-078.nikhef.nl") && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) &&
    (TARGET.Cpus >= RequestCpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))

Job 27504.000 defines the following attributes:

    DiskUsage = 1
    FileSystemDomain = "stoomboot.nikhef.nl"
    RequestCpus = 32
    RequestDisk = DiskUsage (kb)
    RequestMemory = 4000 (mb)

The Requirements _expression_ for job 27504.000 reduces to these conditions:

         Slots
Step    Matched  Condition
-----  --------  ---------
[0]          33  TARGET.Machine == "wn-sate-078.nikhef.nl"
[7]          35  TARGET.Memory >= RequestMemory
[8]           1  [0] && [7]
[9]          17  TARGET.Cpus >= RequestCpus
[10]          0  [8] && [9]
[11]       1130  TARGET.FileSystemDomain == MY.FileSystemDomain

No successful match recorded.
Last failed match: Tue Jun 18 12:09:24 2024

Reason for last match failure: no match found

27504.000:  Run analysis summary ignoring user priority.  Of 30 machines,
     29 are rejected by your job's requirements
      0 reject your job because of their own requirements
      0 match and are already running your jobs
      0 match but are serving other users
      1 are able to run your job

Background : this node has a special capability alongside also being a normal pool node. The bottom line says â1 are able to run your jobâ, but thatâs not true, as HTCondor keeps scheduling single-core jobs onto that machine, so a 32-core slot can never be collected.  Where do I look for documentation on how to do this with HTCondor?

Thanks a lot,

JT