Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Condor Messing with Performance Counters
- Date: Tue, 22 Nov 2022 10:09:25 +0100
- From: Joachim Meyer <jmeyer@xxxxxxxxxxxxxxxxxx>
- Subject: [HTCondor-users] Condor Messing with Performance Counters
Good day,
we are planning to use LIKWID with its perfctr tool to monitor job performance
on our HTCondor 9.10 GPU cluster (Docker universe only).
Our execution nodes are on Ubuntu 20.04, i.e. Linux 5.4 LTS.
When running a job via HTCondor on a machine, the performance counters get
confused, though.
E.g. when running the following command (listening for double precision flops
for 10 seconds), I get tens of teraflops reported on some cores, which they
obviously are not capable of:
> likwid-perfctr -f -g FLOPS_DP -S 10s
The effect sometimes persistst even when no jobs are running on the machine
anymore, i.e. an idle machine reports teraflops. This is up until stopping
HTCondor:
> systemctl stop condor
Profiling an actually flop intensive application does return sane results when
condor is stopped. E.g.:
> likwid-perfctr -m -f -g FLOPS_DP likwid-bench -t peakflops_sse -w S0:12800kB
-w S1:12800kB
After restarting HTCondor I only have observed the faulty behaviour once I ran
another HTCondor job on the machine.
I would assume this might be due to the use of performance counters for
instruction counting in HTCondor interfering with other performance counting
applications.
Is there any way to simply disable HTCondor's "CPUInstructions" counting?
Are there any other ways you can think of, how running a job via HTCondor
might have an effect on performance counting applications?
Just to provide the additional information: I have not been able to reproduce
this by just running a docker container on the same machine while measuring.
Also, it does not have to be a special application run by the job, executing
sleep in a plain ubuntu:latest docker container does have the same effect.
Thanks for any suggestions how to deal with this in advance!
Best,
- Joachim Meyer