[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Can a job send a trigger to let other jobs start?



The problem with trying to do it this way is that it does not play well with pre-empted jobs restarting

I personally would break this down into two independent jobs, one of which stages the data (to a network share) the other of which does the processing.

Then have a rate limit on the first jobs using concurrency limits (see http://www.cs.wisc.edu/condor/manual/v7.2/3_12Setting_Up.html#SECTION0041211000000000000000) but no limit on the follow up jobs.

If you get fancy you can likely try to stage the data to the local disk and use requirements rewriting to force the follow up jobs to go to the same machine but having it on the network is much simpler if your network can handle a few gigs transfer (in a compressed, or at least tarballed form will significantly reduce the transfer overhead). 

Alternatively you may want to look into stork as it tries to solve the problem of getting jobs close to data, albeit in a much more general form.

Matt
 
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Carsten Aulbert
Sent: 14 December 2009 19:52
To: Condor-Users Mail List
Subject: [Condor-users] Can a job send a trigger to let other jobs start?

Hi all,

I've got a problem here, which I don't know how to tackle and what to advise 
the user to do:

The jobs (usually 2000-4000) are started via dagman and read a lot of data 
initially (about 2-3 GByte per jobs). After that they crunch through the 
loaded data for a couple of hours. This initial start-up phase is quite a lot 
of load on the central data server, thus we would like to have a handle to 
limit this.

With dagman's maxjobs feature this could be solved, however this would only 
start new jobs after the first batch of jobs is done. 

Thus my question is, is there a way to limit the initial number of jobs and 
send a "trigger" to dagman to start more jobs, once jobs are done with loading 
their data sets.

Cheers

Carsten
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at: 
https://lists.cs.wisc.edu/archive/condor-users/

----
Gloucester Research Limited believes the information provided herein is reliable. While every care has been taken to ensure accuracy, the information is furnished to the recipients with no warranty as to the completeness and accuracy of its contents and on condition that any errors or omissions shall not be made the basis for any claim, demand or cause for action.
The information in this email is intended only for the named recipient.  If you are not the intended recipient please notify us immediately and do not copy, distribute or take action based on this e-mail.
All messages sent to and from this email address will be logged by Gloucester Research Ltd and are subject to archival storage, monitoring, review and disclosure.
Gloucester Research Limited, 5th Floor, Whittington House, 19-30 Alfred Place, London WC1E 7EA.
Gloucester Research Limited is a company registered in England and Wales with company number 04267560.
----