Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] faster condor_submits with dagman
- Date: Wed, 30 Jun 2010 12:47:35 -0500 (CDT)
- From: Steven Timm <timm@xxxxxxxx>
- Subject: Re: [Condor-users] faster condor_submits with dagman
Peter--have you considered using more than one schedd on your
submitter, that is what some of the big virtual organizations do.
For example, CDF has one schedd to manage the dagmen and another
one to manage the jobs that it spawns. At one time in the past
they used to have as many as four schedd's for the jobs. Basically
the dagman processing and the submission of the jobs that are
the dag stages are competing for the condor_schedd time.
Also, how many spool files do you have on each submitted job, and
how big, that could be an effect.
Also what's the value of JOB_START_COUNT, JOB_START_DELAY
Steve
On Wed, 30 Jun 2010, Peter Doherty wrote:
Hello,
I'm running a large amount of short running jobs (2 minutes, maybe?) on a
large condor pool. I know, I know, this isn't ideal, not Condor's design,
and I should figure out a way to make the jobs longer running. But I want
to work on this a little more.
It's a large Condor DAG managing the jobs.
The jobs are able to finish as fast as dagman can submit new ones into the
queue, so eventually I go from 1000 idle jobs, and 2000 running, to 10 idle
jobs, and 2000 running, and i can't keep the queue full of pending jobs.
I've moved the schedd's spool onto a RAMdisk to try and improve throughput,
and this helped somewhat but not enough.
Any other suggestions to tune to system for a higher rate of job throughput,
before I give up and take a different approach?
Here's some of the variables I've been playing with, but with limited
success.
The machine (schedd and collector/negotiator on the same host) is a 2.4GHz
4-core AMD system with 8GB RAM.
SCHEDD_INTERVAL = 30
DAGMAN_MAX_JOBS_IDLE = 1000
DAGMAN_SUBMIT_DELAY = 0
DAGMAN_MAX_SUBMITS_PER_INTERVAL = 1000
DAGMAN_USER_LOG_SCAN_INTERVAL = 1
SCHEDD_INTERVAL_TIMESLICE = 0.10
SUBMIT_SKIP_FILECHECKS = True
HISTORY =
NEGOTIATOR_INTERVAL = 30
NEGOTIATOR_MAX_TIME_PER_SUBMITTER=20
NEGOTIATOR_MAX_TIME_PER_PIESPIN=20
Thanks,
Peter
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
--
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
timm@xxxxxxxx http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant Group Leader.