[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Jobs staying idle, need to set the default memory in HTCondor-CE



Thanks Jaime,

Is maxMemory something I should always set in the Glideins together or instead of RequestMemory?

The desired behavior would be, especially for whole node glideins, to receive more memory if available but to run as long as the memory available is >= of what we currently put in RequestMemory.
Not to bump to 2GB would allow to run on small VMs/nodes and to use scraps of nodes with little left over memory.
Unless a VO asks otherwise and bumps the minimum.

Iâd like to understand better to see if I should change also the Glideins default configuration in general.


My test setup  is an htcondor-ce with a condor cluster where all jobs should run, uses condor.aarch64 23.5.2 and  htcondor-ce, htcondor-ce-client, htcondor-ce-condor 23.0.8
And I kept the changes to a minimum (few attributes in a file in config.d, the rest is all the default)
Does "switch to the new routing syntax" mean that I can drop memory rules in  /etc/condor-ce/config.d/? Or use "switch to the new routing syntaxâ means I have to remove the default and write new routing rules? 
Do you have instructions or an example?

Thank you,
Marco


> On Apr 22, 2024, at 2:48âPM, Jaime Frey <jfrey@xxxxxxxxxxx> wrote:
> 
> [EXTERNAL] â This message is from an external sender
> 
> You can set maxMemory=RequestMemory in your glidein jobs. That takes precedence in the CEâs routing rules for memory.
> 
> Another option is the switch to the new routing syntax (which youâll have to do before 24.0 later this year). Then, you can alter the job memory rules by dropping a new config file in /etc/condor-ce/config.d/.
> 
> - Jaime
> 
>> On Apr 20, 2024, at 7:35âPM, Marco Mambelli via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
>> 
>> Greetings,
>> Iâm using a small one-node cluster for testing purposes.
>> Iâm unable to run successfully Glidein jobs when the node has only 2GB or less.
>> 
>> The jobs submitted, Glideins, have the following required memory (min required) that is minimal
>> ImageSize = 100
>> ImageSize_RAW = 100
>> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(ImageSize + 1023) / 1024)
>> 
>> But the router defaults in /usr/share/condor-ce/condor_ce_router_defaults kick in and specifically the expression
>>   eval_set_OriginalMemory = ifThenElse(maxMemory isnt undefined,
>>                                        maxMemory,
>>                                        ifThenElse(default_maxMemory isnt undefined,
>>                                                   default_maxMemory,
>>                                                   2000));
>> In JOB_ROUTER_DEFAULTS_GENERATED @=jrd
>> 
>> This set the required memory to 2000 and the job cannot start on the local condor because of the Machine requirements:
>> Requirements = START && (WithinResourceLimits)
>> WithinResourceLimits = (MY.Cpus > 0 && TARGET.RequestCpus <= MY.Cpus && MY.Memory > 0 && TARGET.RequestMemory <= MY.Memory && MY.Disk > 0 && TARGET.RequestDisk <= MY.Disk && (TARGET.RequestGPUs =?= undefined || MY.GPUs >= TARGET.RequestGPUs))
>> 
>> Is there a way I could change the default_maxMemory by adding the value in a config file?
>> 
>> set_default_maxMemory works within a routing policy but  Iâd like to avoid to write custom routing policies if possible.
>> And Iâd prefer not to edit /usr/share/condor-ce/condor_ce_router_defaults
>> 
>> Thank you,
>> Marco
>