[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Negotiation with partitionable slots



Hi Michael,

Thanks for this detailed and very helpful answer.

I'm still a not 100% sure on what to do.
To summarize, in my current config, I would just have to change to:
CONSUMPTION_POLICY = true
CLAIM_PARTITIONABLE_LEFTOVERS = false
Both are "false" for now. At first I understood that I should go to true for both of them but it seems to be a problem (cf https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=4950). So should I just go to true for the policy consumption one?
And keeping my formula to steer jobs, that fit, first to the "small machines class", then to the "medium machines class" and finally to the "big machines class" would do the job right?

There is just one other thing you said that I don't really understand.
In the last part (once this configuration is on then), you said "but if you had four machines instead of three, HostC and HostD would each be running only one job in this example".
Why, in this example, jobs 9 and 10 are spread on Hosts C and D? From what I understood (and it was confirmed by the jobs assigned to Hosts A and B), HostC would be loaded before HostD and be resourceful enough to take these 2 jobs, no?

Cheers,
Mathieu

On 29/04/17 19:00, htcondor-users-request@xxxxxxxxxxx wrote:
Send HTCondor-users mailing list submissions to
	htcondor-users@xxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
	https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
or, via email, send a message with subject or body 'help' to
	htcondor-users-request@xxxxxxxxxxx

You can reach the person managing the list at
	htcondor-users-owner@xxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of HTCondor-users digest..."


Today's Topics:

   1. Re: Negotiation with partitionable slots (Michael Pelletier)


----------------------------------------------------------------------

Message: 1
Date: Sat, 29 Apr 2017 00:20:45 +0000
From: Michael Pelletier <Michael.V.Pelletier@xxxxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Negotiation with partitionable slots
Message-ID:
	<bfc08a19d8c146d99aef878d5f9c622c@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; CHARSET=US-ASCII

Hey Mathieu,

I went through the same sort of learning curve a few years ago with partitionable slots and the early introduction of consumption policies. (I think there might be a couple of CP bug fixes out there with my name in the "customer" field.)

When consumption policies are off, you're usually going to be using "claim_partitionable_leftovers" instead. So let's say you have a set of three machines with 4GB of memory, and ten 1GB jobs are being matched. At the outset, the negotiator sees a collection of machine ads as follows, and claims them for assignment of matching jobs:

HostA - 4GB Partitionable
HostB - 4GB Partitionable
HostC - 4GB Partitionable

Job1 matches to HostA, Job2 matches to HostB, and Job3 matches to HostC. Done. You then have:

HostA - 3GB Partitionable
		1GB Dynamic - Job1
HostB - 3GB Part
		1GB Dyn - Job2
HostC - 3GB Part
		1GB Dyn - Job3

When "claim_partitionable_leftovers" is on, the leftover 3GB partitionable slots go to the scheduler with a claim ID, allowing the schedd itself to assign additional slots to remaining jobs without having to consult the negotiator. This allows the machines to load themselves up very quickly with all the work they're capable of supporting. I have some 64-core machines and I discovered that a user tried to distinguish his iterations using a millisecond-scale timestamp after half a dozen jobs fired up in the same millsecond and stepped all over each other.

With CP, you may wind up with something like this:

HostA - 0GB Part
		1GB Dyn - Job1
		1GB Dyn - Job4
		1GB Dyn - Job5
		1GB Dyn - Job6
HostB - 0GB Part
            1GB Dyn - Job2
            1GB Dyn - Job7
            1GB Dyn - Job8
            1GB Dyn - Job9
HostC - 2GB Part
            1GB Dyn - Job3
            1GB Dyn - Job10

And so on down the line. However, as the manual warns, this can introduce some problems when it comes to concurrency limits. I never noticed anything go particularly awry in this regard, but I suppose it might have and I just didn't notice it.

With consumption policy, instead of having the scheduler split up the leftovers without consulting the negotiator, instead the negotiator handles it. And so instead of having the first slot of a long list of machines get the first round of jobs, you can do a depth-first fill of your machines. This can be advantageous if you have NFS-mounted input data since the multiple processes can leverage the disk IO buffer to minimize the amount of network traffic hammering aging NetApp servers. This is one of the reasons I jumped on CP as soon as it was available, and not quite fully debugged.

The tradeoff is an increased load on the negotiator, but given today's hardware, it's not even breaking a sweat for my pools even with over a thousand cores in the mix and certain users submitting tens of thousands of jobs per cluster. I think the manual suggests going over 5,000 might make it unhappy, but then you just load up on the SSDs, fast memory, and a 4.2GHz CPU for your negotiator host and pump up the volume.
	
Having the negotiator handle partitioning also insures that concurrency limits will be correctly managed, because you won't have the four machines above all claiming jobs at the same time in a concurrency-limit race.

So, simplistically speaking, you get something like this:

HostA - 0GB Part
		1GB Dyn - Job1
		1GB Dyn - Job2
		1GB Dyn - Job3
		1GB Dyn - Job4
HostB - 0GB Part
            1GB Dyn - Job5
            1GB Dyn - Job6
            1GB Dyn - Job7
            1GB Dyn - Job8
HostC - 2GB Part
            1GB Dyn - Job9
            1GB Dyn - Job10

This may not be the greatest example, since the total number of active cores and machines wound up the same, but if you had four machines instead of three, HostC and HostD would each be running only one job in this example.


I hope this helps clarify what's going on.

	-Michael Pelletier.



-----Original Message-----
From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf
Of Mathieu Bahin
Sent: Friday, April 28, 2017 8:38 AM
To: htcondor-users@xxxxxxxxxxx
Subject: Re: [HTCondor-users] Negotiation with partitionable slots

I can read in the manual that "This differs from scheduler matchmaking in
that multiple jobs can match with the partitionable slot during a single
negotiation cycle." (cf
http://research.cs.wisc.edu/htcondor//manual/v8.2.7/3_5Policy_Configuratio
n.html#SECTION004510900000000000000).

But I don't really know how. Is it the regular behaviour with
partitionable slots? Because that's not what I note here in my tests...
I'm not sure to perfectly understand this section in the manual. Is there
something to do with the "CONSUMPTION_POLICY" to solve my issue?
Currently its value is False for us.

Cheers,
Mathieu


------------------------------

Subject: Digest Footer

_______________________________________________
HTCondor-users mailing list
HTCondor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

------------------------------

End of HTCondor-users Digest, Vol 41, Issue 52
**********************************************

-- 
---------------------------------------------------------------------------------------
| Mathieu Bahin
| IE CNRS
|
| Institut de Biologie de l'Ecole Normale Supérieure (IBENS)
| Biocomp team
| 46 rue d'Ulm
| 75230 PARIS CEDEX 05
| 01.44.32.23.56
---------------------------------------------------------------------------------------