Peter--have you considered using more than one schedd on your
submitter, that is what some of the big virtual organizations do.
For example, CDF has one schedd to manage the dagmen and another
one to manage the jobs that it spawns. At one time in the past
they used to have as many as four schedd's for the jobs. Basically
the dagman processing and the submission of the jobs that are
the dag stages are competing for the condor_schedd time.
Also, how many spool files do you have on each submitted job, and
how big, that could be an effect.
Also what's the value of JOB_START_COUNT, JOB_START_DELAY
Steve
On Wed, 30 Jun 2010, Peter Doherty wrote:
Hello,
I'm running a large amount of short running jobs (2 minutes,
maybe?) on a large condor pool. I know, I know, this isn't ideal,
not Condor's design, and I should figure out a way to make the jobs
longer running. But I want to work on this a little more.
It's a large Condor DAG managing the jobs.
The jobs are able to finish as fast as dagman can submit new ones
into the queue, so eventually I go from 1000 idle jobs, and 2000
running, to 10 idle jobs, and 2000 running, and i can't keep the
queue full of pending jobs.
I've moved the schedd's spool onto a RAMdisk to try and improve
throughput, and this helped somewhat but not enough.
Any other suggestions to tune to system for a higher rate of job
throughput, before I give up and take a different approach?
Here's some of the variables I've been playing with, but with
limited success.
The machine (schedd and collector/negotiator on the same host) is a
2.4GHz 4-core AMD system with 8GB RAM.
SCHEDD_INTERVAL = 30
DAGMAN_MAX_JOBS_IDLE = 1000
DAGMAN_SUBMIT_DELAY = 0
DAGMAN_MAX_SUBMITS_PER_INTERVAL = 1000
DAGMAN_USER_LOG_SCAN_INTERVAL = 1
SCHEDD_INTERVAL_TIMESLICE = 0.10
SUBMIT_SKIP_FILECHECKS = True
HISTORY =
NEGOTIATOR_INTERVAL = 30
NEGOTIATOR_MAX_TIME_PER_SUBMITTER=20
NEGOTIATOR_MAX_TIME_PER_PIESPIN=20
Thanks,
Peter
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
--
------------------------------------------------------------------
Steven C. Timm, Ph.D (630) 840-8525
timm@xxxxxxxx http://home.fnal.gov/~timm/
Fermilab Computing Division, Scientific Computing Facilities,
Grid Facilities Department, FermiGrid Services Group, Assistant
Group Leader.
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/