Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] how does the AP resource needs scale with queue size

Date: Mon, 14 Jul 2025 10:23:13 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] how does the AP resource needs scale with queue size


On 7/9/25 12:56 PM, Matthew West via HTCondor-users wrote:

I am curious if the developers have any updates to the generaldescription given inhttps://research.cs.wisc.edu/htcondor/wiki-archive/pages/HowToManageLargeCondorPools/about how a AP's cpu & memory requirements scale with the size of theprospective job queue.
https://urldefense.com/v3/__https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.html*condor-schedd-configuration-file-entries__;Iw!!Mak6IKo!Pk-oOVuaZpyGTUJ3i9IBBpa9yBQWWhUb0Bz6b6LO0aU0K079KUBJM9mDWkcKnlo-dJ-satNXPzBQc-8lp_elsZGwFeam6Q$
With modern servers able to have hundreds of GBs of system memory, isit possible to get queues of jobs (pending >> running) into the 250krange or higher? Or does the speed of storage or networkcommunication become the bottleneck before you get that large?



Hi Matt:

While that wiki page is getting kind of old, the basic architectureinformation hasn't changed.Â I know of several sites with APs runningmore than 10,000 concurrent jobs, but none at 100,000 or more.Â Ourscalability story is always that admins can scale out horizontally, andadd more APs to their system.

My feeling is that even when you can provision a very large memory orcpu-count access point, admins get (rightfully) nervous about having somany eggs in one basket.Â Any kernel reboot or machine glitch or ??? caninterrupt a lot of work.



-greg

References:
- [HTCondor-users] how does the AP resource needs scale with queue size
  - From: Matthew West

Prev by Date: Re: [HTCondor-users] CondorCE: finished jobs not propagating to history/finished for good
Next by Date: [HTCondor-users] Request for Guidance on Installing HTCondor Multi-Pool Setup on macOS
Previous by thread: [HTCondor-users] how does the AP resource needs scale with queue size
Next by thread: Re: [HTCondor-users] Should we specify output/error files in transfer_output_files?
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] how does the AP resource needs scale with queue size