[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Multiple job queues



Thanks Ian and Bradley,
   I understand the way what you have suggested.
That is really great.

Still i have some doubts,
    I dont want to use the MaxRunHours parameters in submit file, coz
that gives the control to users. I want this control to be with condor.

So the other option suits me to specify SYSTEM_PERIODIC_REMOVE in
configuration file on submit machine.
     But with this what i understand is that if i set it to 1 hour then the user
whose jobs takes more than 1 hour always get killed, thats why i want to
have different queues if possible. So that control lies fully with condor.

Coz in my case what is happening is sometime job remains in Running status
but actually its not running, this i had discussed in some of my early posts.
To condor job is in running status, so the resources which are used by these
jobs gets blocked, hence the idle jobs remains in idle position.

        So is there any way to resolve this issue. I dont know why it happens that
job continues in running states even for several day, which should actually get
complete in 8-9 hours.
        
         One more thing i would like to know that is there any way to specify
some time parameter in submit file may be  MaxRunHours  after that,  the
particular job get resubmitted automatically. I hope something for this must
be there in condor which i am missing, or if some helping scripts are there
which can do this work of checking the job status depending upon MaxRunHours
and then resubmit the job. It will be very helpful to me.



Thanks and with regards,

On Fri, Jan 27, 2012 at 10:56 PM, Dan Bradley <dan@xxxxxxxxxxxx> wrote:
Hi Raman,

I'll expand a little on what Ian said.

The Condor way of doing this sort of thing is to add a ClassAd attribute to the job.  For example, in the submit file, the user could put the following:

+MaxRunHours = 8

If you want to automatically remove jobs that run for longer than the specified max runtime, you can put the following in the Condor configuration on the submit machine:

SYSTEM_PERIODIC_REMOVE = JobStatus == 2 && (CurrentTime - EnteredCurrentStatus > 3600*MaxRunHours)

The above _expression_ assumes that MaxRunHours is always defined.  A slightly more complicated _expression_ could supply a default value of 1 hour if MaxRunHours is undefined by the user:

DEFAULT_MAX_RUN_HOURS = 1

SYSTEM_PERIODIC_REMOVE = JobStatus == 2 && (CurrentTime - EnteredCurrentStatus > 3600*ifThenElse(isUndefined(MaxRunHours),$(DEFAULT_MAX_RUN_HOURS),MaxRunHours))

There are more things you might want to configure based on MaxRunHours (e.g. preemption policy), but the above should implement the basic policy of a strict upper bound on job runtime.

--Dan



On 1/27/12 7:11 AM, Ian Chesal wrote:
On Friday, 27 January, 2012 at 3:33 AM, Raman Sehgal wrote:
Hello all,

I was wondering if it is possible to have multiple job queues on same machine.
So that user can submit job to the queue depending upon requirement.
You can certainly have multiple condor_schedd daemons on a single host. See: http://blog.cyclecomputing.com/2010/06/multiple-condor-schedulers-on-a-single-host.html 

For example
Some user is having long jobs say it runs for 8 hours,
on the other hand some users are having short jobs
that runs for 1 hour only.
   So if possible can i have two job queue namely "one hour" and "10 hour".
So that the user of short job submit it to "one hour" queue and
long job users submit to "10 hour" queue.

If the job execution time exceeds the time allocated to job queue, then the jobs should
either be resubmitted or killed.
This sort of thing is generally unnecessary with Condor. You can set a per-submission rule that can be used to terminate the jobs in the submission if they violate some policy you want to put in place.

If you really wanted to do it per-schedd instance take a look at the SYSTEM_PERIODIC_REMOVE setting: http://research.cs.wisc.edu/condor/manual/v7.6/3_3Configuration.html#18767

You'd set it to remove >1 hour jobs on one the schedd instances to achieve what you're after. So:

SYSTEM_PERIODIC_REMOVE = JobStatus == 2 && (CurrentTime - EnteredCurrentStatus > 60)

Would remove jobs running for longer than one hour.

Regards,
- Ian 


---
Ian Chesal

Cycle Computing, LLC
Leader in Open Compute Solutions for Clouds, Servers, and Desktops
Enterprise Condor Support and Management Tools




_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/




--
Raman Sehgal
Scientific Officer D
Nuclear Physics Division
Bhabha Atomic Research Centre