Date: | Tue, 15 Feb 2005 10:46:14 -0600 |
---|---|
From: | "Doak Bane" <doak.bane@xxxxxxxxxx> |
Subject: | Re: [Condor-users] Forcing a job classad from config file? |
Thanks for your prompt responses. A bit more background... I am (obviously) running version 6.7.3 - on RedHat WS 3 Update 3. My current environment is purely for testing - soon to be deployed in production. My site has been running Condor 6.4.7 for about 2 years with very _minimal_ changes to, or understanding of, the default packaged configuration. My job is to change that - to define and implement a set of workable policies. (I am also experimenting with some of the policies outlined in the Bologna Batch System paper.) The target execution environment is a dedicated cluster of dual processor diskless/headless machines - maybe to be expanded to desktop machines in the future. My test playground is a small environment that mimics that. Other comments embedded. Thanks Dan, Doak Bane From: "Dan Bradley" <dan@xxxxxxxxxxxx>
So from your post, I assume that you want MaxJobRetirementTime to be non-zero for either standard universe or nice-user jobs. In all other cases it should already be working. Is this correct?
The problem I'm trying to solve in the current production environment is that User-A would submit thousands of vanilla jobs (one or more clusters). Runtime for each job is typically under 1 hour. User-B submits a few jobs and never gets access to any machines. The greater insult is that User-A then submits more jobs and they run before the User-B jobs. I don't necessarily want preemption to kill/checkpoint/restart the User-A jobs, just to insert a wedge so User-B can get access to some resources within a reasonable period of time. I stumbled on MaxJobRetirementTime from reading this mailing list - not finding it in the version 6.6.7 manual, began exploring 6.7.3. It does EXACTLY what I need - simple, clean, and straight-forward when used with a simple PREEMPTION_REQUIREMENTS expression based on priority, and a shorter (than 1 day) PRIORITY_HALFLIFE. Since my testing has been with standard universe, and wanting MaxJobRetirementTime job classad to be non-zero, my first thought is that the machine classad value should be "copied" to the job classad in standard universe as well - but then you've gotta remember, I'm not a Condor expert. There may be many good reasons to not do that. I have verified that using SUBMIT_EXPRS to set the default MaxJobRetirementTime in the job ClassAd does not work for standard universe and nice-user jobs, because this is getting overwritten to 0. Another problem is that you can't independently set the machine and job attributes, since they both have the same name. I'll think about this and try to provide a solution. From my perspective, I think having that option would be nice. One workaround that may or may not be useful to you until a fix becomes available is to use condor_submit -a MaxJobRetirementTime=X. |
[← Prev in Thread] | Current Thread | [Next in Thread→] |
---|---|---|
|
Previous by Date: | Re: [Condor-users] about submission jobs from condor-g to globus to condor pool, Carson Hung |
---|---|
Next by Date: | [Condor-users] about applications of condor-g, condor, globus, Carson Hung |
Previous by Thread: | Re: [Condor-users] Forcing a job classad from config file?, Dan Bradley |
Next by Thread: | RE: [Condor-users] Forcing a job classad from config file?, Ian Chesal |
Indexes: | [Date] [Thread] |