Thanks but I already knew those tips, the problem is the need
for contiguous memory for things which means that the theoretical limits often
can’t be reached (like I said 200 is about our current limit).
Our pool is plenty big enough now (~600 slots and counting) that
this does cause issues, especially if two users use the same schedd by accident.
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Michael O'Donnell
Sent: 07 October 2010 17:45
To: Condor-Users Mail List
Subject: Re: [Condor-users] Wnidows schedd job limits (Was: RE: Hooks in
the Scheduler Universe?)
I am new with
both administrating Condor as well as using it so maybe I did not understand or
maybe I do not understand Condor's operations with submitting jobs. What I have
read is that on any Windows submit machine, the maximum number of jobs that can
be submitted is dictated by the heap size. It should not make any difference
whether it is VM or not, but we do not have any VM submit machines. Although
the Condor manual does not discuss, it may also be necessary to change the heap
size for non-interactive window stations, but this only effects the job running
on the execute. I have not had the latter problem yet.
The most jobs
that I have concurrently run is about 90 jobs. This is a small amount and does
not require changing the default heap size for Windows. Our pool is small so at
the time being we will not exceed this amount.
This is what I
have in my notes (not necessarily correct):
To increase the
maximum number of jobs that can run in the schedd, the global config
MAX_JOBS_RUNNING should be set and the heap size on the submit machine (windows
only) must be increased
Collector and
negotiator requirements: ~10K RAM per batch slot for each service.
Schedd
requirements: ~10k RAM per job in the queue. Additional memory required when
large ClassAd attributes are used for jobs.
Condor_shadow requirements:
500K RAM for each running job. This can be doubled when jobs submitted from a
32bit system.
Limitation: A
32bit machine cannot run more than 10,000 simultaneous jobs due to kernel
memory constraints.
The Windows
default heap size is 512 Kb. Sixty Condor_shadow daemons use 256 Kb and
therefore 120 shadows will consume the default. To run 300 jobs set the heap
size to 1280 Kb.
I hope this
helps,
Mike
From:
|
Matt
Hope <Matt.Hope@xxxxxxxxxxxxxxx>
|
To:
|
Condor-Users
Mail List <condor-users@xxxxxxxxxxx>
|
Date:
|
10/07/2010
09:38 AM
|
Subject:
|
Re:
[Condor-users] Wnidows schedd job limits (Was: RE: Hooks in the Scheduler
Universe?)
|
Sent by:
|
condor-users-bounces@xxxxxxxxxxx
|
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Ian Chesal
Sent: 07 October 2010 14:48
To: Condor-Users Mail List
Subject: Re: [Condor-users] Wnidows schedd job limits (Was: RE: Hooks in
the Scheduler Universe?)
> One bare metal, with some exceptional nice hardware, I've gotten it up
to 150 running jobs/schedd with
> 4 schedds on the box. And even that feels meta-stable at times. The
lightest of breezes pushes off balance.
> This is Win2k3 Server 64-bit. AFAIK there are no registry tweaks to
the image. What tweaks are you making?
Apparently we don't apply the desktop heap tweaks on the server VM's,
perhaps that only applies to XP or 32bit machines.
What was the most you managed to achieve on one schedd and queue?
>> Job hooks
>> In the hive mind opinion should I not consider even testing using
job hooks (for replacement of schedd/negotiator) on windows right now?
> Well, Todd closed that ticket. I swear it's never worked in 7.4.x for
me but I have
> retest now and confirm this. It can't hurt to try it. But you loose so
much with hooks for pulling jobs.
> You have to do your own job selection, or if you let the startd reject
jobs, you have to have a way to pull
> and then put back jobs that have been rejected, which is inefficient
and difficult to architect such that it
> works well when you've got a *lot* of slots trying to find their next
job to run.
> I'll admit this exactly what I had working at Altera but it was a good
year plus of work to get it functioning.
Yeah I figured that was tricky. Though I would have the advantage that
machines would never need to reject (except as a failure mode).
If I did it I'd likely do it as:
A fixed set of queues (Q => max queues) where a job could be in multiple
queues at once albeit with different ranking. These queues define a strict
ordering of relative ranking for any individual slot
Each queue is actually broken down into per user (U => max users) queues
within it. This is entirely transparent to the below except where noted.
Every slot would map to a (stable) ordering over the possible queues.
In a steady state a freshly available slot would take the job at the top of
the first queue that had a job for it.
In start up/sudden surges of multiple machines available I would do the
same as steady state but order the process by the relative speed of the machine
(we are happy for this to be a simple assigned strict ordering).
Evaluation of any slot would then be O(QU) where Q is the number of
distinct queues. But in almost all cases the queue being non empty => the
job goes to that slot and the process stops. A job being taken by a slot would
eliminate it from other queues. At the point a queue is selected I can do some
simple user level balancing (likely simply attempting to ensure that the user
with the least active jobs gets to go next, I can always add in the user
halflife if needed)
I would have a central server constantly maintaining the top N of each
queue/user pair (where N is some number a bit bigger than the max slots ever
available) and so I would only need to do serious
Sorting effort on the queue on that being drained. Any events on a job
(including insertion of a new one) could only cause a scan in the top N (O(QUN)
in worst case) for comparisons to see if it goes in there, if not it is dropped
into the 'to place bucket'
On neededing to repopulate the top N I can assume that the ordering of the
bottom is still good, I just need to merge in the 'to place ones' which can be
done on a separate thread if need be, and in most cases the ordering will be
pretty simple.
I doubt I would need terribly complex lock free structures. All lock
operations would last only as long as was required to handle
(insert/update/aquire) a single job/slot unless the 'to place' buckests start
to overfill in which case a brief period of allowing only read only access to
clear them out may be required. If need be by taking a per job lock for
assignment (locking protocol queue -> job) then I can deal with a job being
taken by one slot whilst involved in an operation in the other by discard and
recurse on encountering an orphan. I doubt I would bother with this to start
with, accepting that incoming events will have to buffer till after the current
assignment pass is not that big a deal.
The in memory structures need not be replicated to disk, On a restart they
can be reread and reconstructed, as there will be a natural ordering in most
cases that can be used when querying the backing database for state to place
most queues into 'almost sorted' if not actually perfectly sorted which
simplifies things and should give it relatively rapid recovery, not to mention
a very simple failover mode.
Submission becomes as simple as inserting into the database followed by a
'hint ping' to the server process to take a look for fresh data.
Pre-emption becomes the only tricky part. But since that is not likely to
be an issue in the steady state phase we need only run a separate thread to
sniff for possible pre-emption targets (only needed for slots running jobs not
on their best queue) which triggers pre-emption (and remembers that it has done
so) to allow the slot to do its request for a new job (which will then fulfil
its optimal job)
This all only works within the sort of constrained, and crucially nearly
transitive ordered system. But since that is how our system has actually been
for the past 5 years I see no reason to think it will change. Exploiting the
steady state characteristics should allow me to do an extremely fast set up
with very little delay and absolutely no need to query the server for anything
but the actual 'get me a new job' phase and to trigger a recheck of the
database. All submission/modification/querying of the jobs goes directly to the
database.
I would need to add security to the db insert/modification so that only
user X could insert/update rows 'owned' as themselves. Obviously super users
could delete anyone.
Many of the things condor does to deal with streaming/lack of shared
filesystem could be utterly ignored for my purposes, submission would therefore
be much simpler (and transactional to boot).
In all probability I wouldn't be doing this, someone else would. But I can
get a rough idea of the complexity involved from the above.
>> Multiple per user
daemons per box
>> I doubt this would actually improve things
> It does assuming you've got enough CPU so the shadows don't starve on
startup.
> That's one area where I notice Windows falling down quite frequently:
if you've got a high rate of shadows
> spawning it seems to have a lot of trouble getting the
shadow<->startd comm setup and the job started.
> Lately I've been running with a 2:1 ratio of processor/cores to
scheduler daemons.
That's worth knowing thanks.
>> Also not clear if anyone uses this heavily on windows
> All the time.
Right ho, still an idea then
>> Remote submit to linux based
schedd's
>> submission is ultimately a bit of a hack, and forces the
client side to do a lot more state checking
> I have mixed feelings about remote submission. It gets particularly
tricky when you're mixing OSes.
indeed
> I've had better luck with password-free ssh to submit jobs to
centralize Linux machines.
>
> And the SOAP interface to Condor is another approach I've had better
experience with than remote submission from Windows to Linux. Not to say it
can't work, just that I've found it tricky.
I hadn't considered using SOAP, it seemed like it didn't have the
performance of raw condor_submit access when I last tried it but that was long
ago. Also I'm not sure how well (if at all) it supports run_as_owner which
would be a deal breaker.
> Another option is to run a light-weight "submitter" daemon
on the scheduler that you have a custom submit command talk to over
> REST or SOAP and it, in turn, is running as root so it has the ability
to su to any user and submit as them, on that machine.
> Might be easier than ssh.
We already run such a daemon per user (it basically reads from and
maintains a database and does the submit/queue maintenance). The problem is
that, it is all in c# (with considerable backing libraries I don't fancy trying
to port to mono). But that may well be the best option, migrate the daemon to
mono, use it to submit jobs and give everyone their own virtual submit server
(a bit cheaper per VM at least).
Thanks for the ideas.
Matt
--------------
Gloucester Research Limited believes the information provided herein is
reliable. While every care has been taken to ensure accuracy, the information is
furnished to the recipients with no warranty as to the completeness and
accuracy of its contents and on condition that any errors or omissions shall
not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named
recipient. If you are not the intended recipient please notify us
immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by
Gloucester Research Ltd and are subject to archival storage, monitoring, review
and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred
Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales
with company number 04267560.
--------------
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/