Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs License Management

Date: Wed, 15 Oct 2008 12:22:25 -0700
From: Stuart Anderson <anderson@xxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] Jobs License Management


On Oct 15, 2008, at 9:01 AM, Ian Chesal wrote:

Stuart Anderson wrote:

In the context of the new Concurrency Limit will it be possible for

running job to drop a resource constraint when it is done with it,

or

is it implicitly assumed that all jobs require their specified
resources for their entire lifetime?

The motivation for this is managing I/O resources where a typical

work

flow is to launch a large number of jobs that each read in a large
amount data from a shared filesystem (or set of filesystems), and

then

crunch on the data for a long time before outputing a relatively

small

amount of results. It would be interesting to be able to hand out
tokens for filer access but then be able to return them after the

I/O

intensive phase of each individual job is done.

Thanks.

--
Stuart Anderson  anderson@xxxxxxxxxxxxxxxx
http://www.ligo.caltech.edu/~anderson


Right now the limits exist for the lifetime of the job. It is
conceivable that jobs able to modify their ad, via chirp, would be

able

to update the limits they use. However, this is currently not part of
the implementation.

We deal with this now in our own pre-Condor resource scheduler andtruly

the best answer we have come up with this to the problem is: divide up
the jobs. It is more work on the part of the job developer but
ultimately it lets you keep the simplest resource request and

partitioning scheme. Predictability wins out time and time again forus

over complexity.

We'll often see developers writing flows that use limited, expensive

Tool A, then B then C and submitting a job that requires all threethat

then blocks for an eternity, starved, trying to get all three, while
jobs that only need 1 of the three fly by it. The answer is always:
write a job that submits a job. Your entry job uses Tool A, finishes,
submits a job that uses Tool B, etc. DAGs make this even easier.

Stuart, in your case a DAG would work very well: the first point onthe

DAG is your file-transfer intensive portion of the job, and it needs a

resource, the second point that follows is the number crunchingportion

and it doesn't need any resources.

That is working well for us for jobs with static resourcerequirements, i.e., we heavily use the DAGMan CATEGORY and MAXJOBSkeywords. However, the next level of control I am looking for is forjobs that have transient resource requirements. Put another way, Iwould rather not have to break up individual processes that do I/O andthen number crunching into multiple processes, e.g., using sharedmemory as in exchange method.


Thanks.

--
Stuart Anderson  anderson@xxxxxxxxxxxxxxxx
http://www.ligo.caltech.edu/~anderson

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Follow-Ups:
- [Condor-users] Trying to run condor_glidein on the National Grid Service
  - From: Jean-Alain Grunchec

References:
- [Condor-users] Jobs License Management
  - From: kschwarz
- Re: [Condor-users] Jobs License Management
  - From: Matthew Farrellee
- Re: [Condor-users] Jobs License Management
  - From: Jason Stowe
- Re: [Condor-users] Jobs License Management
  - From: Stuart Anderson
- Re: [Condor-users] Jobs License Management
  - From: Matthew Farrellee
- Re: [Condor-users] Jobs License Management
  - From: Ian Chesal

Prev by Date: Re: [Condor-users] Jobs License Management
Next by Date: [Condor-users] Trying to run condor_glidein on the National Grid Service
Previous by thread: Re: [Condor-users] Jobs License Management
Next by thread: [Condor-users] Trying to run condor_glidein on the National Grid Service
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] Jobs License Management