HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Replacing condor_status



As threatened, I have filed this as #3285.

Lans - I think your ideas are great, but perhaps a separate ticket.  What I describe below is a admin-centric way to see the high-level view of the pool.  I think you're looking for a user-centric way to quantify throughput.

TJ - You mentioned a great idea on IRC for having pre-canned reports (condor_status -report foo); this would control the output format of condor_status, and the default could be set by the sysadmin.  This way, "condor_status" could be customizable to each admin's taste.  I think that's something we should file and follow-up on, but separate from what I describe below.

Thanks all,

Brian

On Oct 15, 2012, at 7:52 AM, Douglas Thain <dthain@xxxxxx> wrote:

> +1 On diagnosing negotiation times, it is by far the most perplexing
> problem for our users at ND.
> 
> On Thu, Oct 11, 2012 at 1:33 PM, Lans Carstensen
> <lans.carstensen@xxxxxxxxxxxxxx> wrote:
>> Greetings!  On Thu, Oct 11, 2012 at 9:07 AM, Alain Roy
>> <alain.roy@xxxxxxxxx> wrote:
>>>> Instead, I propose we have a new tool - "condor_pool_summary" - which
>>>> addresses the needs of sysadmins
>>> 
>>> 
>>> I agree! When I was doing work for the OSG summer school, I was frustrated
>>> by the difficulty in getting information about our pool. And I have similar
>>> frustrations with condor_q.
>>> 
>>> I'll add to the list of things I'd like to see: I'd like some information
>>> about negotiation as well. For instance, we had a long cycle (about 10
>>> minutes), where 50% of it was spent timing out against a bunch of
>>> unresponsive schedds. The only insight I had into that was the
>>> NegotiatorLog, which isn't user friendly. I'm not quite sure how to best
>>> present useful information about negotiation, but I'm sure that propagating
>>> this information through your proposed tool (or perhaps another similar
>>> tool?) would help people understand what's going on in their pools better.
>> 
>> So, I agree with everything stated as well, but if you're introducing
>> a new tool I think you should consider starting with a better
>> submitter-facing tool first and then give administrators and
>> submitters a better shared diagnostic view within that.
>> 
>> What we've found that matters most to the users of our system is "how
>> quickly are my jobs getting picked up and completed?" - i.e.
>> throughput.  Our workflows also tend to be deadline-oriented - our
>> users want to submit a set of work, see the work enter the system, and
>> then either see a few jobs pick up and extrapolate a throughput so
>> they can estimate completion or be assured at least of some level of
>> fairness and be given an operational metric to give them confidence in
>> that fairness (I'm using 95% of my expected allocation of resources).
>> 
>> In a many-schedd environment, there isn't any way today to get either
>> of those views without extensive development of additional services
>> and client-facing tools.  We had a number of those custom tools
>> previous to using Condor, and we've built up many more as part of
>> using and scaling condor.  We'd be happy to discuss what we've needed
>> to do around all that, and have done so 1-on-1 with many folks
>> already.  Previously it infered a particular operational practice that
>> we weren't sure was generalized, but it certainly feels relevant to
>> this discussion.
>> 
>> This is why during CondorWeek I said I'd like to see some of the new
>> negotiator and schedd ad stats also introduced into the submitter ad.
>> Additional information (like priority, fairshare allocations, and
>> concurrency limits...) should also be introduced to the ad so that
>> there's enough there to infer expected throughput rates.  You could
>> then have a better client-facing tool to report to users and operators
>> how effective their throughput is.  And then, if their throughput is
>> lower than expected people will ask "is the overall pool utilization
>> and throughput where it should be?" - and there should be a single
>> source of truth between submitters and administrators on that as well.
>> 
>> So by all means, let's get a commonplace utilization script in - but
>> could we also get something to help create shared understanding
>> between submitters and administrators as well?
>> 
>> -- Lans Carstensen
>> _______________________________________________
>> Condor-devel mailing list
>> Condor-devel@xxxxxxxxxxx
>> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel
> _______________________________________________
> Condor-devel mailing list
> Condor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel

Attachment: smime.p7s
Description: S/MIME cryptographic signature