[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] How to find out CPU affinity from schedd plugin



Hi Greg,
ups, completely forgot to mention, that we looked into ASSIGN_CPU_AFFINITY.
My understanding is, that, given one partitionable slot of 100% of the CPU's 
threads, the threads will be pinned to the first N free cores in increasing ID 
order:
https://github.com/htcondor/htcondor/blob/
8bcd2442756564ebbfdb6955fb16483806fda236/src/condor_startd.V6/ResMgr.cpp#L1777

We could maybe model this in the schedd but that sounds fishy (esp when there's 
two schedds). Instead, it'd be ideal if the schedd would get the list of 
assigned CPU cores reported by the startd upon starting a job.

Now, re why we're considering pinning jobs to cores:
1. the pressing one is monitoring: we want to be able to provide performance 
counter monitoring per job, i.e. as far as possible only report performance 
counter values (e.g. FLOPs/s), for the job in question and not report the 
whole system's FLOPs/s, core frequency, user CPU usage, ...
ClusterCockpit just needs to know which CPU threads were assigned to a job to 
filter this. Other metrics, such as memory bandwidth can only be measured on a 
CPU socket basis, but again the relevant socket(s) are identified via the 
assigned CPU threads.
I assume that at least user CPU usage could (and already is by HTCondor) 
easily be measured on a per process level, but afaict, the actual performance 
counter metrics such as FLOPs/memory bandwidth cannot be filtered by process 
but only by CPU threads.

2. But there's another reason except "cleaner job-specific monitoring": while 
pinning CPU cores might lead to lower overall CPU usage, due to not being able 
to use excess cores when the node's utilization allows it, and probably users 
having to overestimate their CPU usage: currently we regularly have users that 
greatly underestimate their job's CPU resource demands. So they will only 
request 5 CPUs but when the node is otherwise idle, a multiple of the 
requested CPUs are actually utilized. That's fine in theory, because that means 
the node will actually be fully utilized if there's multiple jobs scheduled to 
it. But on the other hand that leads to confusion as well: users are 
unsatisfied that their job's performance drops when multiple (of their) jobs 
are scheduled to the same machine as now multiple jobs that use more cores 
than "requested" lead to the cgroup limiting to start kicking in.
Long story short: we find CPU pinning attractive as well, as users will get 
much more predictable performance independent on the cluster's occupancy.
As our jobs usually are rather GPU-bound, we're also not as concerned with 
getting the last bit of utilization out of the CPUs.

So, from our point of view, we're quite okay (and actually see benefits) with 
pinning jobs to cores in theory.
Our current blocker is, to make this useful for monitoring, we'd need to know 
which CPUs a job was pinned to.
In the worst case.. could we utilize a startd plugin (or similar) to access 
the affinity mask?

Thanks for any insights!
Best,
- Joachim

P.S.: 
In the future, we'll probably need to consider NUMA aware pinning, since it 
might be very much hurting the performance if we indeed pin a job to CPU cores 
of different NUMA nodes (if we still have enough free cores to schedule them to 
a single one). The performance penalty might also be quite notable, if we pin 
our jobs to a socket and assign a GPU to the job that is connected to the 
other socket.


Am Montag, 13. Februar 2023, 19:24:48 CET schrieb Greg Thain via HTCondor-
users:
> On 2/13/23 07:46, Joachim Meyer wrote:
> > Hi,
> > 
> > 
> > Is there a way to assign CPU cores and get the information which cores are
> > assigned from within a condor_schedd plugin? So far it seems to me that
> > the
> > assigned CPU cores are not reported back to the schedd in some classad?
> 
> Hi Joachim:
> 
> There are controls in HTCondor to affinity-lock jobs to cpu cores -- see
> the ASSIGN_CPU_AFFINITY setting in
> https://htcondor.readthedocs.io/en/latest/admin-manual/configuration-macros.
> html?highlight=ASSIGN_CPU_AFFINITY#condor-starter-configuration-file-entries
> 
> But, there is a reason this is not on by default.  While locking jobs to
> specific cores may sometimes improve performance for that job, it can
> also lower the overall throughput of the system.
> 
> What are the specific performance counts per job that you are interested
> in?  I wonder if there is a better way to more directly capture these?
> 
> -greg
> 
> > We're using HTCondor 9.12 right now.
> > 
> > Thanks for any pointers!
> > - Joachim
> > 
> > 
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with
> > a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> > 
> > The archives can be found at:
> > https://lists.cs.wisc.edu/archive/htcondor-users/
> 
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> 
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/