Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] "Job has not yet been considered by the matchmaker" after condor_qedit

Date: Thu, 31 May 2018 11:49:43 -0500
From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] "Job has not yet been considered by the matchmaker" after condor_qedit

On 5/30/2018 2:04 PM, Vaurynovich, Siarhei wrote:

Hello,
*Please, let me know if there is a way to force HTCondor matchmaker toconsider a job cluster for scheduling.*

The command "condor_reschedule", issued on the submit host (i.e. wherethe schedd is running), will do that. However, by default, this shouldhappen automatically every few minutes.

My jobs often sit unscheduled in the queue for many hours (indefinitely)if I use condor_qedit to adjust job requirements.
To make sure jobs have enough RAM to run, I sometimes restrict allowedSlotID range in requirements. There is probably a better way to do it:i.e. somehow to declare RAM as a shared resource with certain number ofunits of the resource available, but for now this is my quick hack to doit. Setting ImageSize does not work since my jobs are almost alwaysbigger than per slot RAM and so if I give realistic job size, my jobswould never start. Creating specialized slots is also a bad idea sincemy jobs vary strongly in size.

The above sounds like pretty strange usage. As you suspect, there arebetter ways to do this. Assuming you are using a current version ofHTCondor (i.e. HTCondor v8.6 or above), instead of configuring yournodes to partition resources like memory into statically sized slots,you could configure your nodes to use dynamic (partitionable) slots.See the HTCondor Manual section "Dynamic Provisioning: Partitionable andDynamic Slots" at URL http://tinyurl.com/y83a9ufo. Once setup yourexecute nodes to use a partitionable slot as described, then yourcondor_submit file can look like:


  executable = foo
  # This job only needs one CPU core in the execute slot
  request_cpus = 1
  # This job needs 3.5 GB of RAM in the execute slot
  request_memory = 3500
  queue

and the execute node (startd) will carve off a new slot with 3.5GB ofmemory for this job. No messing around with ImageSize required.

The problem is that often after such adjustment, my jobs would oftenstop being scheduled for running – they sit in the queue indefinitelyand ‘condor_q -better-analyze clusterID’ gives “Job has not yet beenconsidered by the matchmaker.” while claiming that there are slots“available to run your job”. If I do not use condor_qedit, jobs runfine. If I kill the same jobs and then submit them again with newrequirements, they also run fine.

This sounds pretty strange. Can you easily reproduce it? Does ithappen every time or only sometimes? What version of HTCondor are youusing, on what platform?


regards,
Todd

References:
- [HTCondor-users] "Job has not yet been considered by the matchmaker" after condor_qedit
  - From: Vaurynovich, Siarhei

Prev by Date: Re: [HTCondor-users] "Job has not yet been considered by the matchmaker" after condor_qedit
Previous by thread: Re: [HTCondor-users] "Job has not yet been considered by the matchmaker" after condor_qedit
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] "Job has not yet been considered by the matchmaker" after condor_qedit