[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] Replacing condor_status
- Date: Mon, 15 Oct 2012 08:52:18 -0400
- From: Douglas Thain <dthain@xxxxxx>
- Subject: Re: [Condor-devel] Replacing condor_status
+1 On diagnosing negotiation times, it is by far the most perplexing
problem for our users at ND.
On Thu, Oct 11, 2012 at 1:33 PM, Lans Carstensen
<lans.carstensen@xxxxxxxxxxxxxx> wrote:
> Greetings! On Thu, Oct 11, 2012 at 9:07 AM, Alain Roy
> <alain.roy@xxxxxxxxx> wrote:
>>> Instead, I propose we have a new tool - "condor_pool_summary" - which
>>> addresses the needs of sysadmins
>>
>>
>> I agree! When I was doing work for the OSG summer school, I was frustrated
>> by the difficulty in getting information about our pool. And I have similar
>> frustrations with condor_q.
>>
>> I'll add to the list of things I'd like to see: I'd like some information
>> about negotiation as well. For instance, we had a long cycle (about 10
>> minutes), where 50% of it was spent timing out against a bunch of
>> unresponsive schedds. The only insight I had into that was the
>> NegotiatorLog, which isn't user friendly. I'm not quite sure how to best
>> present useful information about negotiation, but I'm sure that propagating
>> this information through your proposed tool (or perhaps another similar
>> tool?) would help people understand what's going on in their pools better.
>
> So, I agree with everything stated as well, but if you're introducing
> a new tool I think you should consider starting with a better
> submitter-facing tool first and then give administrators and
> submitters a better shared diagnostic view within that.
>
> What we've found that matters most to the users of our system is "how
> quickly are my jobs getting picked up and completed?" - i.e.
> throughput. Our workflows also tend to be deadline-oriented - our
> users want to submit a set of work, see the work enter the system, and
> then either see a few jobs pick up and extrapolate a throughput so
> they can estimate completion or be assured at least of some level of
> fairness and be given an operational metric to give them confidence in
> that fairness (I'm using 95% of my expected allocation of resources).
>
> In a many-schedd environment, there isn't any way today to get either
> of those views without extensive development of additional services
> and client-facing tools. We had a number of those custom tools
> previous to using Condor, and we've built up many more as part of
> using and scaling condor. We'd be happy to discuss what we've needed
> to do around all that, and have done so 1-on-1 with many folks
> already. Previously it infered a particular operational practice that
> we weren't sure was generalized, but it certainly feels relevant to
> this discussion.
>
> This is why during CondorWeek I said I'd like to see some of the new
> negotiator and schedd ad stats also introduced into the submitter ad.
> Additional information (like priority, fairshare allocations, and
> concurrency limits...) should also be introduced to the ad so that
> there's enough there to infer expected throughput rates. You could
> then have a better client-facing tool to report to users and operators
> how effective their throughput is. And then, if their throughput is
> lower than expected people will ask "is the overall pool utilization
> and throughput where it should be?" - and there should be a single
> source of truth between submitters and administrators on that as well.
>
> So by all means, let's get a commonplace utilization script in - but
> could we also get something to help create shared understanding
> between submitters and administrators as well?
>
> -- Lans Carstensen
> _______________________________________________
> Condor-devel mailing list
> Condor-devel@xxxxxxxxxxx
> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel