[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor_schedd stuck when submitting a large amount of jobs



Jamie and all,

I hope to cache jobÂsubmissions and processÂat a manageable pace. But don't want to use max job per user which fails submissions with a lot of jobs.Â

Is there a knob for the admin to force a value for max_materilize per schedd? I assume this is not something for job transformation.

Thanks!

On Fri, Mar 10, 2023 at 4:20âPM Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
That depends on what picture youâre trying to see.
The schedd and submitter ads in the collector give a good summary of total and per-user job totals in the pool.

Â- Jaime

On Mar 10, 2023, at 3:09 PM, JM <jm@xxxxxxxxxxxxxxxxxxxx> wrote:

Jaime,

Good to know that the answer is no.Â

My plan B is to collect a list of all submit hosts and query them individually. Maybe I can cache the result of the last query of a submit host. In case condor_schedd is busy, I can always timeout and use the last query value. Is there a more elegant or correct solution to get the whole picture?

Thank you very much.






On Fri, Mar 10, 2023 at 3:59âPM Jaime Frey via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
If the submission of the large amount of jobs is in a single submit action (i.e. single condor_submit or Python bindings submit() call), then the answer is ânoâ. A submit request is a synchronous operation for the condor_schedd. It will do nothing else until the submission is completed.

As you note, âmax_materializeâ is a good way to reduce the time the schedd spends on the job submission. I donât believe thereâs any way for the admin to trigger its use for large job submissions.

We discourage the use of frequent âcondor_q -global -allâ commands. We can discuss alternative solutions if youâre relying on this command.

Â- Jaime

> On Mar 10, 2023, at 2:24 PM, JM <jm@xxxxxxxxxxxxxxxxxxxx> wrote:
>
> HTCondor Community,
>
> In an environment with multiple condor_schedd on different servers, we experienced an issue that "condor_q -global -all" stuck on a submit host for a short period of time while a large amount of jobs were being submitted via that submit host.
>
> I understand that users may use max_materialize to release the pressure on the condor_schedd. But is there a knob for admins to give condor_schedd high priority to response queries while working on the new submission?
>


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/