âHi Raman,
âI am a site administrator (not an HTCondor developer), so please take my advice as a reference rather than a definitive answer, as it is based on a limited number of cases I have managed.
âRegarding the maximum number of jobs on a single Submitter (AP):
âIn my experience, the stability of the AP depends heavily on the disk performance of the SPOOL directory (typically /var/lib/condor/spool).
âIncreasing this limit significantly will mainly impact disk I/O and the memory usage of the condor_shadow processes. If you have high-speed storage and enough RAM, scaling up to 100k might be feasible, but I would suggest monitoring the job_queue.log write latency closely.
âI hope this helps as a reference for your scaling strategy.
âBest regards
-Geonmo
ëëìë : Ram Ban <ramban046@xxxxxxxxx>
ëëìë : Miron Livny <miron@xxxxxxxxxxx>
ìì : HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
ëìëì : 2026-04-18 (í) 03:56:05
ìë : Re: [HTCondor-users] Scaling in HTCondor
Any update on this?
On Wed, Apr 15, 2026, 11:31 Ram Ban < ramban046@xxxxxxxxx> wrote:
We have 1 master, and some static submitters(AP),
Users submit Jobs on Submitters with there requirement.
A process is running on a static machine which runs condor_q -all -g and launch Executors (EP) for jobs to run on them.
On each Executor if no job is coming for 3mins we stop condor and power off it.
Now when users have to submit more jobs We spin up more Submitters(AP), and stop them if no jobs are running on them and load is fine on static machines.
So wanted to know by increasing max number of Jobs on AP that can run? and Max number of APs we can run on a master.
Thanks and regardsRaman
On Wed, Apr 15, 2026, 01:29 Miron Livny < miron@xxxxxxxxxxx> wrote:
Apologies for asking so many questions ... so, when you see the load at the AP you turn on (over the network?) some servers ... once they do not run any payload for 3 minutes, they turn themself out ... did I get it right?
Miron
From: Ram Ban <ramban046@xxxxxxxxx>
Sent: Tuesday, April 14, 2026 14:22
To: Miron Livny <miron@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Scaling in HTCondorWe run condor_q and find all the requirements for Jobs(like require GPUs, ram etc) and then start execute machines and start condor on them, There condor_config has Role Execute
If any execute is running for more than 3min without any Jobs, we stop condor and poweroff it
On Tue, Apr 14, 2026, 23:28 Miron Livny < miron@xxxxxxxxxxx> wrote:
Thank you, Raman.
How do you launch the executors on the machine?
Do I assume correctly that the machine is managed by a batch system?
Miron
Sent from my iPhone
On Apr 14, 2026, at 11:35, Ram Ban < ramban046@xxxxxxxxx> wrote:
ïHello Miron,
Currently we use htcondor for sharing a execute machine for multiple Jobs using partitionable slots with vanilla universe.
User submit Jobs in htcondor and we launch executors so that jobs are scheduled on them.
Now some times multiple users want to submit a lot of jobs(like in order of millions) which has small requirement (cpu, ram) and also running for large time(like hours). So we spin up some AP and add to htcondor pool so that they can be used,
So I want to know can we increase the limits to reduce submission time.We try to reduce number of jobs by clubbing multiple jobs into 1 htcondor job but sometimes it is not possible.
Thanks and regardsRaman
On Tue, Apr 14, 2026, 21:46 Miron Livny < miron@xxxxxxxxxxx> wrote:
Thank you for your interest (using?) HTCondor, Raman.
AS a translational research center, we are extremely interested in knowing (as much as we are allowed ...) about how HTCondor is deployed and used in the real world.
Would it be possible for us here at CHTC to learn more about how you do or plan to use HTCondor.
Thanks,
Miron
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Ram Ban <ramban046@xxxxxxxxx>
Sent: Tuesday, April 14, 2026 11:02
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Scaling in HTCondorHi all,
I want to know what is the max number of jobs that can on a submitter(AP)?Current default limit is 10k. If there are not much more processes running on AP, can we increase this number to like 100k?. What will be the side affects of increasing this number?
I also want to know what is the max number of submitters(AP) can a master machine handle?Recently I have seen with more than 40 APs, scheduling seems to become too slow and I have to wait for Job to get scheduled.
Thanks and RegardsRaman