As threatened, I have filed this as #3285. Lans - I think your ideas are great, but perhaps a separate ticket. What I describe below is a admin-centric way to see the high-level view of the pool. I think you're looking for a user-centric way to quantify throughput. TJ - You mentioned a great idea on IRC for having pre-canned reports (condor_status -report foo); this would control the output format of condor_status, and the default could be set by the sysadmin. This way, "condor_status" could be customizable to each admin's taste. I think that's something we should file and follow-up on, but separate from what I describe below. Thanks all, Brian On Oct 15, 2012, at 7:52 AM, Douglas Thain <dthain@xxxxxx> wrote: > +1 On diagnosing negotiation times, it is by far the most perplexing > problem for our users at ND. > > On Thu, Oct 11, 2012 at 1:33 PM, Lans Carstensen > <lans.carstensen@xxxxxxxxxxxxxx> wrote: >> Greetings! On Thu, Oct 11, 2012 at 9:07 AM, Alain Roy >> <alain.roy@xxxxxxxxx> wrote: >>>> Instead, I propose we have a new tool - "condor_pool_summary" - which >>>> addresses the needs of sysadmins >>> >>> >>> I agree! When I was doing work for the OSG summer school, I was frustrated >>> by the difficulty in getting information about our pool. And I have similar >>> frustrations with condor_q. >>> >>> I'll add to the list of things I'd like to see: I'd like some information >>> about negotiation as well. For instance, we had a long cycle (about 10 >>> minutes), where 50% of it was spent timing out against a bunch of >>> unresponsive schedds. The only insight I had into that was the >>> NegotiatorLog, which isn't user friendly. I'm not quite sure how to best >>> present useful information about negotiation, but I'm sure that propagating >>> this information through your proposed tool (or perhaps another similar >>> tool?) would help people understand what's going on in their pools better. >> >> So, I agree with everything stated as well, but if you're introducing >> a new tool I think you should consider starting with a better >> submitter-facing tool first and then give administrators and >> submitters a better shared diagnostic view within that. >> >> What we've found that matters most to the users of our system is "how >> quickly are my jobs getting picked up and completed?" - i.e. >> throughput. Our workflows also tend to be deadline-oriented - our >> users want to submit a set of work, see the work enter the system, and >> then either see a few jobs pick up and extrapolate a throughput so >> they can estimate completion or be assured at least of some level of >> fairness and be given an operational metric to give them confidence in >> that fairness (I'm using 95% of my expected allocation of resources). >> >> In a many-schedd environment, there isn't any way today to get either >> of those views without extensive development of additional services >> and client-facing tools. We had a number of those custom tools >> previous to using Condor, and we've built up many more as part of >> using and scaling condor. We'd be happy to discuss what we've needed >> to do around all that, and have done so 1-on-1 with many folks >> already. Previously it infered a particular operational practice that >> we weren't sure was generalized, but it certainly feels relevant to >> this discussion. >> >> This is why during CondorWeek I said I'd like to see some of the new >> negotiator and schedd ad stats also introduced into the submitter ad. >> Additional information (like priority, fairshare allocations, and >> concurrency limits...) should also be introduced to the ad so that >> there's enough there to infer expected throughput rates. You could >> then have a better client-facing tool to report to users and operators >> how effective their throughput is. And then, if their throughput is >> lower than expected people will ask "is the overall pool utilization >> and throughput where it should be?" - and there should be a single >> source of truth between submitters and administrators on that as well. >> >> So by all means, let's get a commonplace utilization script in - but >> could we also get something to help create shared understanding >> between submitters and administrators as well? >> >> -- Lans Carstensen >> _______________________________________________ >> Condor-devel mailing list >> Condor-devel@xxxxxxxxxxx >> https://lists.cs.wisc.edu/mailman/listinfo/condor-devel > _______________________________________________ > Condor-devel mailing list > Condor-devel@xxxxxxxxxxx > https://lists.cs.wisc.edu/mailman/listinfo/condor-devel
Attachment:
smime.p7s
Description: S/MIME cryptographic signature