Steffen Grunewald wrote:
For a homogeneous pool, and "simple" job clusters (identical specs for all jobs) NEGOTIATE_ALL_JOBS_IN_CLUSTER is suggested to be set to False. On the other hand, there may be situations where the first job of a single cluster continues to fail (for whatever reason: memory overcommit comes to mind) thus blocking all others.
Hi Steffen - What version of Condor are you working with?Starting back w/ Condor v7.0.x and above, the default built-in auto clustering mechanism in Condor should prevent the situations you describe above --- and do so in a much more efficient/scalable manner than setting NEGOTIATE_ALL_JOBS_IN_CLUSTER to TRUE (which is the kiss of performance death if you have thousands of jobs).
Is it possible to - e.g. once per given time period (4 hours?) - "flush" the queue by temporarily setting the macro to True?
Maybe something else is going on? With Condor v7.0.x and above with the default auto-clustering, I assert you should never have to resort to NEGOTIATE_ALL_JOBS_IN_CLUSTER = True. Are you over-riding autoclustering in your config file by expliciting setting SIGNIFICANT_ATTRIBUTES or some such on your condor_config on your submit hosts?
best, Todd -- Todd Tannenbaum University of Wisconsin-Madison Condor Project Research Department of Computer Sciences tannenba@xxxxxxxxxxx 1210 W. Dayton St. Rm #4257