Stuart, Thanks for the tips. I'll play around before the O3 break ends. Sincerely, Shawn On 10/20/19 6:27 PM, Stuart Anderson wrote: > Shawn, > I think cgroup is the right solution for this. Try something > like the following to limit the total memory and cpu usage by condor > and all the processes that it spawns on your CentOS 7 systems, > > /etc/systemd/system/condor.service.d/cgroup.conf > > [Service] > MemoryAccounting = true > ExecStartPost = /bin/bash -c "cgcreate -g *:htcondor; cgset -r memory.limit_in_bytes=186G -r memory.memsw.limit_in_bytes=186G -r cpu.cfs_quota_us=2000000 /htcondor" > > After restarting the condor service you should then see these limits > in /sys/fs/cgroup/*/htcondor and be able to dynamically change them. > > If you then run a Condor job like "/bin/stress -c 50" you should be > able to see with systemd-cgtop that the cpu utilization is capped > at 20 cpu-cores, and similarly run a large memory test job to see > that the expected limits are in place. > > > Instead of a static quota on condor cpu usage you could also make > sure your priority non-condor services are running in a cgroup > and grant that a much higher value of cpu.shares to make sure they > are never starved for cpu cycles regardless of what condor jobs want. > > Thanks. > > >> On Oct 18, 2019, at 6:48 AM, Shawn A Kwang <kwangs@xxxxxxx> wrote: >> >> Signed PGP part >> Greg, >> >> Thanks for the response. Here is the issue with "NUM_CPUS". >> >> I have attached the partitionable slot configuration that Tom put >> together for the cluster. I haven't touched this since he moved-on. You >> can see at the top he put: >> >> num_cpus = 2 * $(DETECTED_CPUS) >> >> I have no clue as to why this was done, but I suspect it has to do with >> the partitionable slot configurations in the rest if this file. Which >> looks to partition the cluster into two partitions, one seems to be >> dedicated to the 'online_cbc_gstlal_inspiral' analysis and the other for >> all other jobs. >> >> Thus I don't know if I should be changing this setting. Which is one >> reason I looked into the cgroups and other cpu affinity settings. >> >> Tom also set the RAM in this file as well, which is a reason I am >> investigating cgroups for memory-limiting condor as well as cpu-limiting >> condor. >> >> Sincerely, >> Shawn >> >> On 10/17/19 3:39 PM, Greg Thain wrote: >>> On 10/17/19 11:39 AM, Shawn A Kwang wrote: >>>> In Condor (v8.9.1) how do I assign CPU affinity to jobs on the compute >>>> nodes with 24 cores? Let's say I want to limit condor to using 20 cores: >>>> 0-19, for users jobs. It should be noted: the cluster is using >>>> partitionable slots. >>>> >>>> Bigger picture: I wish to limit condors resources because the compute >>>> nodes run alongside the ceph-osd daemons which I want to 'reserve' a >>>> certain amount of RAM and CPU. >>> >>> >>> Shawn: >>> >>> What I would do on this machine is set >>> >>> >>> NUM_CPUS = 20 >>> >>> in the htcondor config. >>> >>> This will tell htcondor that it only has 20 cores to work with (but not >>> which physical ones), and condor will only dole out 20 cores worth of >>> work. With cgroups, if there is contention for all the cores on the >>> system, the sum of the condor jobs shouldn't exceed 20 cores worth, but >>> the kernel is free to pick which physical cores to use, leaving the rest >>> for ceph or other system daemons. >>> >>> >>> -greg >>> >>> _______________________________________________ >>> HTCondor-users mailing list >>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a >>> subject: Unsubscribe >>> You can also unsubscribe by visiting >>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users >>> >>> The archives can be found at: >>> https://lists.cs.wisc.edu/archive/htcondor-users/ >> >> >> -- >> Associate Scientist >> Center for Gravitation, Cosmology, and Astrophysics >> University of Wisconsin-Milwaukee >> office: +1 414 229 4960 >> kwangs@xxxxxxx >> <50slot.txt> >> >> > > -- > Stuart Anderson > sba@xxxxxxxxxxx > > > > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/ > -- Associate Scientist Center for Gravitation, Cosmology, and Astrophysics University of Wisconsin-Milwaukee office: +1 414 229 4960 kwangs@xxxxxxx
Attachment:
signature.asc
Description: OpenPGP digital signature