[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] idle job doesn't want to run



I’m trying to figure out how to dig into why a job isn’t running:

condor_q 4180196

 

 

-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?... @ 12/14/23 16:45:56

OWNER      BATCH_NAME     SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS

gweatherby ID: 4180196  12/14 11:57      _      _      1      1 4180196.0

 

Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

Total for all users: 1156 jobs; 0 completed, 0 removed, 1149 idle, 7 running, 0 held, 0 suspended

The VM I’m trying to pin the job to shows as able to run:

condor_q 4180196 -bet

 

 

-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?...

The Requirements _expression_ for job 4180196.000 is

 

    ((Machine == "gerarddaily20.nmrbox.org")) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) ||

      (TARGET.HasFileTransfer))

 

Job 4180196.000 defines the following attributes:

 

    DiskUsage = 1

    FileSystemDomain = "nmrbox.org"

    ImageSize = 1

    RequestDisk = DiskUsage

    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)

 

The Requirements _expression_ for job 4180196.000 reduces to these conditions:

 

         Slots

Step    Matched  Condition

-----  --------  ---------

[0]           1  Machine == "gerarddaily20.nmrbox.org"

[9]          59  TARGET.FileSystemDomain == MY.FileSystemDomain

 

No successful match recorded.

Last failed match: Thu Dec 14 16:46:26 2023

 

Reason for last match failure: no match found

 

4180196.000:  Run analysis summary ignoring user priority.  Of 52 machines,

     51 are rejected by your job's requirements

      0 reject your job because of their own requirements

      0 match and are already running your jobs

      0 match but are serving other users

      1 are able to run your job

and reverse analyze says it matches:
-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?...

4180196.0: Analyzing matches for 1 job

                                Slot  Slot's Req    Job's Req     Both   

Name                            Type  Matches Job Matches Slot    Match %

------------------------        ---- ------------ ------------ ----------

...

slot1@xxxxxxxxxxxxxxxxxxxxxxxx  Part            1            1     100.00 

...