If that is the submit file, then yes.
From: Mihai Ciubancan <ciubancan@xxxxxxxx>
Sent: Monday, June 2, 2025 5:14 AM To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx> Cc: John M Knoeller <johnkn@xxxxxxxxxxx> Subject: Re: [HTCondor-users] problems with jobs requiring more then 2GB memory Hello,
Thank you TJ for your answer! Taking in consideration that the jobs are submitted through submit-condor-job(ARC-CE), this is the file that should be modified to allow jobs that need more than 2GB of memory , right? Best, Mihai On 2025-05-30 20:04, John M Knoeller via HTCondor-users wrote: > The job is running out of memory because it is only requesting 2Gb of > RAM but then using more than that. > > SLOT_TYPE_1_PARTITIONABLE=TRUE > > Means that a slot with the amount of cpus and memory requested by the > job will be created when AP decides to run that job, up to a maximum > of 8 CPUs and 4 GB, because > > SLOT_TYPE_1=cpus=8, memory=4096 > > To fix this, you need to change the request_memory of the job's > submit file to request more memory > > -tj > > ------------------------- > > From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf > of Mihai Ciubancan <ciubancan@xxxxxxxx> > Sent: Friday, May 30, 2025 2:28 AM > To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx> > Subject: [HTCondor-users] problems with jobs requiring more then 2GB > memory > > Hello, > > I encounter issues with LHCb jobs ,which are requiring more than 2GB > per > jobs. The jobs are failling with the following error: > > LastHoldReason = "Error from reserved-LHCb2_5@xxxxxxxxxxxxxx: Job has > gone over cgroup memory limit of 2048 megabytes. Last measured usage: > 2033 megabytes. Consider resubmitting with a higher request_memory." > > I have configure partionable slots: > > CLAIM_WORKLIFE=3600 > CONTINUE=TRUE > JOB_RENICE_INCREMENT=10 > KILL=FALSE > NUM_SLOTS=4 > NUM_SLOTS_TYPE_1=4 > SLOT_TYPE_1_PARTITIONABLE=TRUE > SLOT_TYPE_1=cpus=8, memory=4096 > SLOT_TYPE_1_START=Owner=="pillhcb01" > SLOT_TYPE_1_NAME_PREFIX=reserved-LHCb > PREEMPT=FALSE > RANK=0 > SUSPEND=FALSE > SLOT_TYPE_1_CONSUMPTION_POLICY=False > CONSUMPTION_POLICY=False > CLAIM_PARTITIONABLE_LEFTOVERS=False > > Also is enable cgroup policy: > > BASE_CGROUP = /system.slice/condor.service > CGROUP_MEMORY_LIMIT_POLICY = soft > MAXJOBRETIREMENTTIME = $(HOUR) * 24 * 7 > SYSTEM_PERIODIC_REMOVE = ResidentSetSize > 3000*RequestMemory > > If you have any suggestion will be highly appreciated! > > Best, > Mihai > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx > with a > subject: Unsubscribe > > Join us in June at Throughput Computing 25: > https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!K29kgDu3KqY-v0JvPE9cVXxO9hKbX4vVgC2pMuc85_5TCTwv4huZH_KU-ElZEvUc6BvAtLM_1S1Sk8MicXaY$ > [1] > > The archives can be found at: > https://www-auth.cs.wisc.edu/lists/htcondor-users/ [2] > > > Links: > ------ > [1] > https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!K29kgDu3KqY-v0JvPE9cVXxO9hKbX4vVgC2pMuc85_5TCTwv4huZH_KU-ElZEvUc6BvAtLM_1S1Sk8MicXaY$ > [2] https://www-auth.cs.wisc.edu/lists/htcondor-users/ > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx > with a > subject: Unsubscribe > > Join us in June at Throughput Computing 25: https://urldefense.com/v3/__https://osg-htc.org/htc25__;!!Mak6IKo!P7WuolxydAg-X-1OmGYbRBGp4oZ__j_4CvVWOoiVWEMdikcLzzEwy1nQGgf6iP2-uurJZCT3nN-1PEZcElNK$ > > The archives can be found at: > https://www-auth.cs.wisc.edu/lists/htcondor-users/ |