[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Negative kflops benchmarks (-2147483648) in HTCondor 25.0 LTS



Hello,

I would like to report an issue of our institutes HTCondor pool.

Some nodes are not receiving any jobs and report unexpectedly negative benchmark values, with kflops = -2147483648. Multiple restarts of the condor service on the ten affected nodes only resolved the issue on 20 % of them. As a workaround, I generated artificial load on the nodes during the benchmarking process, which temporarily resolved the issue. However, the problem reappears whenever the condor service is restarted without artificial load present.

The affected workstations use Intel(R) Core(TM) CPUs, including Ultra 7 265K and i9-12900K generations.

Condor is configured to directly use condor_kflops and condor_mips for benchmarking:

benchmarks_joblist = mips kflops
benchmarks_kflops_executable = $(LIBEXEC)/condor_kflops
benchmarks_mips_executable  = $(LIBEXEC)/condor_mips

When running condor_kflops directly as root, it produces a reasonable value and doesn't show negative results:

root@cymothex:~# /usr/libexec/condor/condor_kflops
KFlops = 2608096

We are currently running HTCondor version 25.0.3. Upgrading to 25.0.5 or downgrading to 25.0.2 does not resolve the issue. However with version 25.0.2 the kflops value changes to -1 instead.

This behavior reminds me of the issue reported by John Veitch in September:
[HTCondor-users] condor_kflops returning -1
https://www-auth.cs.wisc.edu/lists/htcondor-users/2025-September/msg00063.shtml

Below is some condor output:

root@cymothex:~# condor_status -constraint "kflops < 0" -af:h name kflops mips condorversion
name           kflops      mips  condorversion
slot1@cordycep -1          54597 $CondorVersion: 25.0.2 2025-10-08 BuildID: 840620 PackageID: 25.0.2-1+deb12 GitSHA: 24fb2387 $
slot1@bancroft -2147483648 54153 $CondorVersion: 25.0.3 2025-10-31 BuildID: 847298 PackageID: 25.0.3-1+deb12 GitSHA: dc94bfbb $
slot1@cymothex -2147483648 54565 $CondorVersion: 25.0.5 2025-12-12 BuildID: 856732 PackageID: 25.0.5-1+deb12 GitSHA: 5493979a $


I'm not so sure why this problem now pops up. We have similar nodes that don't show this problem. Some of the affected nodes may be better cooled due to the low temperatures outside. Do you have any hints or advice regarding this issue with condor_kflops?

Best regards,
Stefan