Re: [HTCondor-users] Tips for improving schedd performance with many jobs

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

From: "Beyer, Christoph"<christoph.beyer@xxxxxxx>

Date: Fri, Oct 17, 2025, 14:26

Subject: Re: [HTCondor-users] Tips for improving schedd performance with many jobs

To: "HTCondor-Users Mail List"<htcondor-users@xxxxxxxxxxx>

Hi Jiaqi,

it depends a little bit on your setup, one thing you might want to consider is putting the spool directory on a fast SSD and of course the schedd needs sufficiant RAM.Â

The condor approach would also be to establish more scheduler, use late materialization andÂ batches of jobs (queue 100 instead of 100 x queue 1), also teach people to use condor_watch_q instead of 'watch condor_q', limit the number of jobs per user in the queue.Â

Using decent hardware, fast SSDs and a fibrechannel connected filesystem for job log & output writing you can run a sched with up to 100k jobs in different states without hassling from my experience.

Things immediatley become a nuisance in the setup described if someone submits jobs with a typo in the log dir path etc. though but that's just people, they break things ;)Â

Best

christoph

--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx

Von: "æåç" <jiaqi.pan@xxxxxxxxxxx>
An: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Gesendet: Freitag, 17. Oktober 2025 08:02:54
Betreff: [HTCondor-users] Tips for improving schedd performance with manyÂÂÂÂÂÂÂÂjobs

Hi all,

Weâre running HTCondor 24.0.5 with one dedicated submit node (Access Point).

When the number of submitted jobs gets large â say over 20,000 â we notice that commands like condor_q become really slow, and sometimes even time out or fail.

If we put some idle jobs on hold, things get much more responsive again.

That helps temporarily, but weâd prefer not to intervene manually if possible.

I also tried increasing the value of SCHEDD_QUERY_WORKERS, but it didnât seem to make much difference.

So Iâm wondering if anyone has tuning tips or best practices for improving schedd performance when handling a large number of jobs.

Are there specific configuration tweaks or limits we should look into?

Thanks a lot for any suggestions!

Best,

Jiaqi

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Tips for improving schedd performance with many jobs