[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] CPU Affinity in condor v8.9.1



Stuart,

Thanks for the tips. I'll play around before the O3 break ends.

Sincerely,
Shawn

On 10/20/19 6:27 PM, Stuart Anderson wrote:
> Shawn,
> 	I think cgroup is the right solution for this. Try something
> like the following to limit the total memory and cpu usage by condor
> and all the processes that it spawns on your CentOS 7 systems,
> 
> /etc/systemd/system/condor.service.d/cgroup.conf
> 
> [Service]
> MemoryAccounting = true
> ExecStartPost    = /bin/bash -c "cgcreate -g *:htcondor; cgset -r memory.limit_in_bytes=186G -r memory.memsw.limit_in_bytes=186G -r cpu.cfs_quota_us=2000000 /htcondor"
> 
> After restarting the condor service you should then see these limits
> in /sys/fs/cgroup/*/htcondor and be able to dynamically change them.
> 
> If you then run a Condor job like "/bin/stress -c 50" you should be
> able to see with systemd-cgtop that the cpu utilization is capped
> at 20 cpu-cores, and similarly run a large memory test job to see
> that the expected limits are in place.
> 
> 
> Instead of a static quota on condor cpu usage you could also make
> sure your priority non-condor services are running in a cgroup
> and grant that a much higher value of cpu.shares to make sure they
> are never starved for cpu cycles regardless of what condor jobs want.
> 
> Thanks.
> 
> 
>> On Oct 18, 2019, at 6:48 AM, Shawn A Kwang <kwangs@xxxxxxx> wrote:
>>
>> Signed PGP part
>> Greg,
>>
>> Thanks for the response. Here is the issue with "NUM_CPUS".
>>
>> I have attached the partitionable slot configuration that Tom put
>> together for the cluster. I haven't touched this since he moved-on. You
>> can see at the top he put:
>>
>> num_cpus = 2 * $(DETECTED_CPUS)
>>
>> I have no clue as to why this was done, but I suspect it has to do with
>> the partitionable slot configurations in the rest if this file. Which
>> looks to partition the cluster into two partitions, one seems to be
>> dedicated to the 'online_cbc_gstlal_inspiral' analysis and the other for
>> all other jobs.
>>
>> Thus I don't know if I should be changing this setting. Which is one
>> reason I looked into the cgroups and other cpu affinity settings.
>>
>> Tom also set the RAM in this file as well, which is a reason I am
>> investigating cgroups for memory-limiting condor as well as cpu-limiting
>> condor.
>>
>> Sincerely,
>> Shawn
>>
>> On 10/17/19 3:39 PM, Greg Thain wrote:
>>> On 10/17/19 11:39 AM, Shawn A Kwang wrote:
>>>> In Condor (v8.9.1) how do I assign CPU affinity to jobs on the compute
>>>> nodes with 24 cores? Let's say I want to limit condor to using 20 cores:
>>>> 0-19, for users jobs. It should be noted: the cluster is using
>>>> partitionable slots.
>>>>
>>>> Bigger picture: I wish to limit condors resources because the compute
>>>> nodes run alongside the ceph-osd daemons which I want to 'reserve' a
>>>> certain amount of RAM and CPU.
>>>
>>>
>>> Shawn:
>>>
>>> What I would do on this machine is set
>>>
>>>
>>> NUM_CPUS = 20
>>>
>>> in the htcondor config.
>>>
>>> This will tell htcondor that it only has 20 cores to work with (but not
>>> which physical ones), and condor will only dole out 20 cores worth of
>>> work.  With cgroups, if there is contention for all the cores on the
>>> system, the sum of the condor jobs shouldn't exceed 20 cores worth, but
>>> the kernel is free to pick which physical cores to use, leaving the rest
>>> for ceph or other system daemons.
>>>
>>>
>>> -greg
>>>
>>> _______________________________________________
>>> HTCondor-users mailing list
>>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
>>> subject: Unsubscribe
>>> You can also unsubscribe by visiting
>>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>>>
>>> The archives can be found at:
>>> https://lists.cs.wisc.edu/archive/htcondor-users/
>>
>>
>> -- 
>> Associate Scientist
>> Center for Gravitation, Cosmology, and Astrophysics
>> University of Wisconsin-Milwaukee
>> office: +1 414 229 4960
>> kwangs@xxxxxxx
>> <50slot.txt>
>>
>>
> 
> --
> Stuart Anderson
> sba@xxxxxxxxxxx
> 
> 
> 
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> 


-- 
Associate Scientist
Center for Gravitation, Cosmology, and Astrophysics
University of Wisconsin-Milwaukee
office: +1 414 229 4960
kwangs@xxxxxxx

Attachment: signature.asc
Description: OpenPGP digital signature