On Thu, 30 Mar 2006, Ian Chesal wrote:
I want to overcome undesirable behavoir of Condor, but I failed to find the right configuration entries. The problem is follows: When job finishes on a machine and there is another job in queue from this user matching to the same machine, Condor is running the next user's job without negotiation. So, it does not care about user priorities. There are no entries in NegotiatorLog about starting the second user's job, and in SchedLog: Starting add_shadow_birthdate(1408.0) Started shadow for job 1408.0 on "<192.168.201.2:32773>", (shadow pid=6681) The desireble behavoir after job completion is to loop over all users and give free resource to those one with lower effective priority. Please, can you help me to solve this problem?This is part of the "high throughput" portion of Condor. A claim on a startd remains in place until it: a) runs out of jobs to process from the cluster; or b) gets preempted by another claim. As long as there's no one with a lower user priority value in the system it's much more efficient to keep cycling through the jobs from the current cluster being executed than re-negotiate because you don't have to tear down and setup the shadow again.
OK--now I am really confused. Other E-mails on other threads have said that the schedd will keep the claim on the startd as long as it has any jobs for that user (if PREEMPTION_REQUIREMENTS = FALSEand RANK=0). Now you are saying it is just
if the jobs are in the same cluster. Which is it? Steve
It sounds like you've disabled user priority preemption on your central negotiator. Is that the case? What is PREEMPTION_REQUIREMENTS set to on your central negotiator? Preemption is only considered if this expression evaluates to true. If you want preemption to be based on user priorities, make this expression compare the remote (running job) user priority with the current (user being negotiated) user priority. See the default condor_config file for an example. You can "auto-preempt" jobs that have been running for longer than X minutes if you really want to have a startd re-negotiated after ever job completes. We actually do this here at Altera and it works fairly well. As long as your jobs are long (say 20 minutes or greater) the impact on through put is pretty minimal. I would also add a cautionary note that we run ONLY vanilla universe jobs that do no checkpoint so this scheme works great. For checkpoint-able jobs or a mixed bag of jobs I don't think you'll want to go this route. To trigger "auto-preemption" you need to use the MAX_JOB_RETIREMENT_TIME setting and the PREEMPT setting in your startd configuration file. You set MAX_JOB_RETIREMENT_TIME to be something much longer than any job you expect to run in your system: MAX_JOB_RETIREMENT_TIME = 9676800 And you set PREEMPT to be true after the job has been running for say 5 minutes: PREEMPT = ( $(ActivationTimer) > 300 ) This automatically sets the job to be "preempting" after it's run for 5 minutes. But the retirement timer lets the job finish normally. The difference is that because the job was "preempted" and there's no waiting schedd claim on the startd causing the preemption, the startd is renegotiated like you want. If you go with this approach I also suggest setting the schedd preemption timeout to be small and possibly disallowing negotiator preemptions: PREEMPTION_REQUIREMENTS = FALSE REQUEST_CLAIM_TIMEOUT = 120 You can get more information on all these configuration settings from the Condor documentation online. - Ian _______________________________________________ Condor-users mailing list Condor-users@xxxxxxxxxxx https://lists.cs.wisc.edu/mailman/listinfo/condor-users