Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Running long jobs

Date: Mon, 5 Dec 2005 23:50:57 +0100 (CET)
From: Daniel R Figueiredo <daniel.figueiredo@xxxxxxx>
Subject: Re: [Condor-users] Running long jobs

Hi Eric and Ralph,

Thanks for your respective messages. I now understand better the idea ofusing two VMs per processor and how this could indeed lead to a solution.However, I still don't understand why a more simple solution, such as theone suggested by Ralph, would not work. To be clear, I don't know whyCondor decides to evict the long jobs (say, around 15 hours). It could bekeyboard activity, as suggested. However, it could also be due to userpriorities (this is probably more likely). Recall that this job is runningin a heavily loaded Condor cluster (several users, dispatch queue withlarge backlog), which could make the long job receive low priority (overtime) compared to new submitted jobs by users with few jobs. Can this casealso be handled with a similar approach as suggested by Ralph? If not, isthis why we need the VM approach?

Sorry for the long exchange of messages in resolving this issue, but Iwould like to understand what is going on here.


Thanks,
Daniel



On Sun, 4 Dec 2005, Finch, Ralph wrote:

I don't think Daniel needs two VMs; he simply wants his one job to
suspend for some reason, then resume when the "reason" no longer
applies.

Looking at his original post, Daniel said:

"The problem is that after the job has been running for some hours (say
10 hours) Condor decides to evict the job from the machine."

Why it gets evicted is not said, so we don't know the criteria for
suspending a job.  I'll assume keyboard activity. Then "the minimal set
of configuration fields that must be changed in order to achieve
[suspension instead of eviction]" is:

WANT_SUSPEND 		= TRUE
PREEMPT			= FALSE
PREEMPTION_REQUIREMENTS	= FALSE
KILL 				= FALSE

ContinueIdleTime		= 5 * $(MINUTE)
SUSPEND			= $(KeyboardBusy)
CONTINUE			= (KeyboardIdle > $(ContinueIdleTime))

Ralph Finch, P.E.
Dept. of Water Resources
Bay-Delta Office, Room 215-13
Sacramento, CA  95814
916-653-7552
rfinch@xxxxxxxxxxxx

-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx
[mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Erik Paulson
Sent: Saturday, December 03, 2005 11:39 AM
To: Condor-Users Mail List
Subject: Re: [Condor-users] Running long jobs

On Sat, Dec 03, 2005 at 07:01:43PM +0100, Daniel R Figueiredo wrote:


On Wed, 30 Nov 2005, Erik Paulson wrote:

Thanks for your message. It's now clear that I'll need

support from the

Condor administrator. However, I looked through the report

"Condor and The

Bolonga Batch System" as you suggested, but it was not clear how to
configurate Condor to run long jobs with preemption implemented via
suspension (as opposed to preemption via termination). In

particular, I

would like to know what is the minimal set of configuration

fields that

must be changed in order to achieve this? Recall that I

would like for

long jobs to be preempted via suspension (as opposed to

terminated through

a signal) and later resume from where they stopped (as opposed to
restarting from the beginning). Any ideas on how to this? I

could then

suggest something concrete to our local Condor administrator.


You need to create 2 VMs. There is no way to have one VM
suspend a job, start
another one, and resume the first one later resume it later -
if a job has
state on a machine, it must have a VM watching over it, and a
VM can only
watch over one job at a time.

You can emulate your desired behaviour with 2 VMs - the
second VM can be
configured to suspend the job whenever it sees the state of
the first VM
switch to "Claimed". The BBS document should give you all of
the details you
need.

-Erik
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users


_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

References:
- Re: [Condor-users] Running long jobs
  - From: Finch, Ralph

Prev by Date: Re: [Condor-users] Condor on Xbox??
Next by Date: Re: [Condor-users] Problems with jobs
Previous by thread: Re: [Condor-users] Running long jobs
Next by thread: Re: [Condor-users] Running long jobs
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] Running long jobs