[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] idle job doesn't want to run



condor_q 4180196 -better -machine gerarddaily20.nmrbox.org -reverse

 

 

-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?...

 

-- Slot: slot1@xxxxxxxxxxxxxxxxxxxxxxxx : Analyzing matches for 1 Jobs in 1 autoclusters

 

The Requirements _expression_ for this slot is

 

    START &&

        (WithinResourceLimits)

 

  START is

    (Target.Production isnt true)

 

  WithinResourceLimits is

    (MY.Cpus > 0 &&

      TARGET.RequestCpus <= MY.Cpus && MY.Memory > 0 &&

      TARGET.RequestMemory <= MY.Memory && MY.Disk > 0 &&

      TARGET.RequestDisk <= MY.Disk)

 

This slot defines the following attributes:

 

    Cpus = 16

    Disk = 28442908

    Memory = 32098

 

Job 4180196.0 has the following attributes:

 

    TARGET.Production = false

    TARGET.RequestCpus = 1

    TARGET.RequestDisk = 1

    TARGET.RequestMemory = 1

 

The Requirements _expression_ for this slot reduces to these conditions:

 

       Clusters

Step    Matched  Condition

-----  --------  ---------

[0]           1  Target.Production isnt true

[1]           1  WithinResourceLimits

 

slot1@xxxxxxxxxxxxxxxxxxxxxxxx: Run analysis summary of 1 jobs.

    1 (100.00 %) match both slot and job requirements.

    1 match the requirements of this slot.

    1 have job requirements that match this slot.

 

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Tim Theisen via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Date: Friday, December 15, 2023 at 12:16
âPM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Cc: Tim Theisen <tim@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] idle job doesn't want to run

*** Attention: This is an external email. Use caution responding, opening attachments or clicking on links. ***

So, you asked the forward question. Does the schedd think that it would match?

Now ask the reverse. Does the startd think that it would match?

condor_q 4180196 -better -machine gerarddaily20.nmrbox.org -reverse

...Tim

On 12/14/23 16:01, Weatherby,Gerard wrote:

Iâm trying to figure out how to dig into why a job isnât running:

condor_q 4180196

 

 

-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?... @ 12/14/23 16:45:56

OWNER      BATCH_NAME     SUBMITTED   DONE   RUN    IDLE  TOTAL JOB_IDS

gweatherby ID: 4180196  12/14 11:57      _      _      1      1 4180196.0

 

Total for query: 1 jobs; 0 completed, 0 removed, 1 idle, 0 running, 0 held, 0 suspended

Total for all users: 1156 jobs; 0 completed, 0 removed, 1149 idle, 7 running, 0 held, 0 suspended

The VM Iâm trying to pin the job to shows as able to run:

condor_q 4180196 -bet

 

 

-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?...

The Requirements _expression_ for job 4180196.000 is

 

    ((Machine == "gerarddaily20.nmrbox.org")) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) ||

      (TARGET.HasFileTransfer))

 

Job 4180196.000 defines the following attributes:

 

    DiskUsage = 1

    FileSystemDomain = "nmrbox.org"

    ImageSize = 1

    RequestDisk = DiskUsage

    RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)

 

The Requirements _expression_ for job 4180196.000 reduces to these conditions:

 

         Slots

Step    Matched  Condition

-----  --------  ---------

[0]           1  Machine == "gerarddaily20.nmrbox.org"

[9]          59  TARGET.FileSystemDomain == MY.FileSystemDomain

 

No successful match recorded.

Last failed match: Thu Dec 14 16:46:26 2023

 

Reason for last match failure: no match found

 

4180196.000:  Run analysis summary ignoring user priority.  Of 52 machines,

     51 are rejected by your job's requirements

      0 reject your job because of their own requirements

      0 match and are already running your jobs

      0 match but are serving other users

      1 are able to run your job

and reverse analyze says it matches:
-- Schedd: accesspoint.nmrbox.org : <155.37.253.48:9618?...

4180196.0: Analyzing matches for 1 job

                                Slot  Slot's Req    Job's Req     Both   

Name                            Type  Matches Job Matches Slot    Match %

------------------------        ---- ------------ ------------ ----------

...

slot1@xxxxxxxxxxxxxxxxxxxxxxxx  Part            1            1     100.00 

...



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
 
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
-- 
Tim Theisen (he, him, his)
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736