Hello,
I'm running a large amount of short running jobs (2 minutes, maybe?)
on a large condor pool. I know, I know, this isn't ideal, not
Condor's design, and I should figure out a way to make the jobs
longer
running. But I want to work on this a little more.
It's a large Condor DAG managing the jobs.
The jobs are able to finish as fast as dagman can submit new ones
into
the queue, so eventually I go from 1000 idle jobs, and 2000 running,
to 10 idle jobs, and 2000 running, and i can't keep the queue full of
pending jobs.
I've moved the schedd's spool onto a RAMdisk to try and improve
throughput, and this helped somewhat but not enough.
Any other suggestions to tune to system for a higher rate of job
throughput, before I give up and take a different approach?
Here's some of the variables I've been playing with, but with limited
success.
The machine (schedd and collector/negotiator on the same host) is a
2.4GHz 4-core AMD system with 8GB RAM.
SCHEDD_INTERVAL = 30
DAGMAN_MAX_JOBS_IDLE = 1000
DAGMAN_SUBMIT_DELAY = 0
DAGMAN_MAX_SUBMITS_PER_INTERVAL = 1000
DAGMAN_USER_LOG_SCAN_INTERVAL = 1
SCHEDD_INTERVAL_TIMESLICE = 0.10
SUBMIT_SKIP_FILECHECKS = True
HISTORY =
NEGOTIATOR_INTERVAL = 30
NEGOTIATOR_MAX_TIME_PER_SUBMITTER=20
NEGOTIATOR_MAX_TIME_PER_PIESPIN=20
Thanks,
Peter
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/