[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] MAX_JOBS_SUBMITTED exceeded, submit failed. Current total is 499999. Limit is 50000



Hi,

yes I think you are right I was a bit confused as I do not set the value I thought it might be kind of a counter but it seems the default is just set to a very high number ...

[root@bird-htc-sched11 ~]# condor_config_val -v MAX_JOBS_SUBMITTED
MAX_JOBS_SUBMITTED = 2147483647
 # at: <Default>
 # raw: MAX_JOBS_SUBMITTED = 2147483647

Is this kind of ceiling for the sched what you need ?

Best
christoph


--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Vikrant Aggarwal" <ervikrant06@xxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Mittwoch, 8. MÃrz 2023 14:54:40
Betreff: Re: [HTCondor-users] MAX_JOBS_SUBMITTED exceeded, submit failed. Current total is 499999. Limit is 50000

Hello,

After making the change to MAX_JOBS_SUBMITTED restarted condor Sched service. 

# grep restarted /var/log/condor/ScheddRestartReport
The schedd el6study2.skae.tower-research.com restarted at 03/08/23 05:10:40.

$ condor_config_val MAX_JOBS_SUBMITTED
5

Submitted batch of 5 jobs. 

$ condor_submit sleep.sub
Submitting job(s).....
5 job(s) submitted to cluster 313.

Trying to submit another batch fails as I have 5 jobs in queue. 

$ condor_submit sleep.sub
Submitting job(s)
ERROR: Failed to create cluster
Number of submitted jobs would exceed MAX_JOBS_SUBMITTED

If I wait for the completion of existing jobs then I can submit another 5 jobs without any issue which makes me believe that this parameter is related to jobs present in the queue irrespective of their status (hold/running/idle). I don't think it's related to the total number of jobs submitted in sched..


Thanks & Regards,
Vikrant Aggarwal


On Wed, Mar 8, 2023 at 12:48âPM Beyer, Christoph <christoph.beyer@xxxxxxx> wrote:
Hi,

it seems to me - at least on my scheds the MAX_JOBS_SUBMITTED is indeed the number of jobs the sched dealt with since the last boot (I suppose)

At least this is definetley not the current number of jobs on this sched:

[root@bird-htc-sched11 ~]# condor_config_val MAX_JOBS_SUBMITTED
2147483647

;)

Hence it looks to me as if MAX_JOBS_SUBMITTED should not be set at all unless you want to stop the scheduling after a certain amount of jobs ?

Maybe MAX_JOBS_PER_OWNER is more likely to do what you want (limiting the number of jobs per owner on the sched) ?

Best
christoph


--
Christoph Beyer
DESY Hamburg
IT-Department

Notkestr. 85
Building 02b, Room 009
22607 Hamburg

phone:+49-(0)40-8998-2317
mail: christoph.beyer@xxxxxxx


Von: "Vikrant Aggarwal" <ervikrant06@xxxxxxxxx>
An: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Gesendet: Mittwoch, 8. MÃrz 2023 08:06:16
Betreff: Re: [HTCondor-users] MAX_JOBS_SUBMITTED exceeded, submit failed. Current total is 499999. Limit is 50000

Thanks Jamie,
But we don't have this many jobs in the queue. The batch we are trying to submit has only a handful of jobs still we are hitting the max job limit. 

03/07/23 21:46:46 (pid:55697) NewCluster(): MAX_JOBS_SUBMITTED exceeded, submit failed. Current total is 300027. Limit is 300000

03/07/23 22:11:09 (pid:55697) NewCluster(): MAX_JOBS_SUBMITTED exceeded, submit failed. Current total is 300000. Limit is 300000


It's happening randomly but often on a few submit nodes (not all). All submit nodes are with the same conf.  

Thanks & Regards,
Vikrant Aggarwal


On Wed, Feb 8, 2023 at 9:24âPM Jaime Frey via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
> On Feb 7, 2023, at 11:01 AM, Todd L Miller via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>
>> We hit this issue multiple times: Issue disappears if we restart the condor
>> service or change the MAX_JOBS_SUBMITTED limit.
>
>       You probably shouldn't be setting MAX_JOBS_SUBMITTED at all.  It's a cap on the total number of clusters a schedd is willing to have dealt with for its entire life.  What are you trying to accomplish?


This is incorrect. MAX_JOBS_SUBMITTED is a cap on the number of jobs that can be queued at any given time.

 - Jaime
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/