HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Replacing condor_status



Greetings!  On Thu, Oct 11, 2012 at 9:07 AM, Alain Roy
<alain.roy@xxxxxxxxx> wrote:
>>  Instead, I propose we have a new tool - "condor_pool_summary" - which
>> addresses the needs of sysadmins
>
>
> I agree! When I was doing work for the OSG summer school, I was frustrated
> by the difficulty in getting information about our pool. And I have similar
> frustrations with condor_q.
>
> I'll add to the list of things I'd like to see: I'd like some information
> about negotiation as well. For instance, we had a long cycle (about 10
> minutes), where 50% of it was spent timing out against a bunch of
> unresponsive schedds. The only insight I had into that was the
> NegotiatorLog, which isn't user friendly. I'm not quite sure how to best
> present useful information about negotiation, but I'm sure that propagating
> this information through your proposed tool (or perhaps another similar
> tool?) would help people understand what's going on in their pools better.

So, I agree with everything stated as well, but if you're introducing
a new tool I think you should consider starting with a better
submitter-facing tool first and then give administrators and
submitters a better shared diagnostic view within that.

What we've found that matters most to the users of our system is "how
quickly are my jobs getting picked up and completed?" - i.e.
throughput.  Our workflows also tend to be deadline-oriented - our
users want to submit a set of work, see the work enter the system, and
then either see a few jobs pick up and extrapolate a throughput so
they can estimate completion or be assured at least of some level of
fairness and be given an operational metric to give them confidence in
that fairness (I'm using 95% of my expected allocation of resources).

In a many-schedd environment, there isn't any way today to get either
of those views without extensive development of additional services
and client-facing tools.  We had a number of those custom tools
previous to using Condor, and we've built up many more as part of
using and scaling condor.  We'd be happy to discuss what we've needed
to do around all that, and have done so 1-on-1 with many folks
already.  Previously it infered a particular operational practice that
we weren't sure was generalized, but it certainly feels relevant to
this discussion.

This is why during CondorWeek I said I'd like to see some of the new
negotiator and schedd ad stats also introduced into the submitter ad.
Additional information (like priority, fairshare allocations, and
concurrency limits...) should also be introduced to the ad so that
there's enough there to infer expected throughput rates.  You could
then have a better client-facing tool to report to users and operators
how effective their throughput is.  And then, if their throughput is
lower than expected people will ask "is the overall pool utilization
and throughput where it should be?" - and there should be a single
source of truth between submitters and administrators on that as well.

So by all means, let's get a commonplace utilization script in - but
could we also get something to help create shared understanding
between submitters and administrators as well?

-- Lans Carstensen