[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] idle job doesn't want to run



So, you asked the forward question. Does the schedd think that it would match?

Now ask the reverse. Does the startd think that it would match?

condor_q 4180196 -better -machine gerarddaily20.nmrbox.org -reverse

...Tim

On 12/14/23 16:01, Weatherby,Gerard wrote:

Iâm trying to figure out how to dig into why a job isnât running:

condor_q 4180196

 

 

-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?... @ 12/14/23 16:45:56

OWNER      BATCH_NAME     SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS

gweatherby ID: 4180196  12/14 11:57      _      _      1      1 4180196.0

 

Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

Total for all users: 1156 jobs; 0 completed, 0 removed, 1149 idle, 7 running, 0 held, 0 suspended

The VM Iâm trying to pin the job to shows as able to run:

condor_q 4180196 -bet

 

 

-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?...

The Requirements _expression_ for job 4180196.000 is

 

    ((Machine == "gerarddaily20.nmrbox.org")) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) ||

      (TARGET.HasFileTransfer))

 

Job 4180196.000 defines the following attributes:

 

    DiskUsage = 1

    FileSystemDomain = "nmrbox.org"

    ImageSize = 1

    RequestDisk = DiskUsage

    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)

 

The Requirements _expression_ for job 4180196.000 reduces to these conditions:

 

         Slots

Step    Matched  Condition

-----  --------  ---------

[0]           1  Machine == "gerarddaily20.nmrbox.org"

[9]          59  TARGET.FileSystemDomain == MY.FileSystemDomain

 

No successful match recorded.

Last failed match: Thu Dec 14 16:46:26 2023

 

Reason for last match failure: no match found

 

4180196.000:  Run analysis summary ignoring user priority.  Of 52 machines,

     51 are rejected by your job's requirements

      0 reject your job because of their own requirements

      0 match and are already running your jobs

      0 match but are serving other users

      1 are able to run your job

and reverse analyze says it matches:
-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?...

4180196.0: Analyzing matches for 1 job

                                Slot  Slot's Req    Job's Req     Both   

Name                            Type  Matches Job Matches Slot    Match %

------------------------        ---- ------------ ------------ ----------

...

slot1@xxxxxxxxxxxxxxxxxxxxxxxx  Part            1            1     100.00 

...


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736