Dear all,
We are currently running a project that executes parallel jobs using MPI via HTCondor's Parallel Universe. The jobs are completing successfully, so execution is not the issue. However, we have noticed a significant discrepancy between CumulativeRemoteCpu and RemoteCpu, and we do not understand why, given that the jobs have only started once (NumJobStarts = 1). The same issue applies to RemoteSysCpu.
Here you have an example:
# condor_history -limit 1 1505513.0 -af JobUniverse NumJobStarts RemoteUserCpu CumulativeRemoteUserCpu RemoteSysCpu CumulativeRemoteSysCpu RemoteWallclockTime MaxHosts RequestCpus
11 1 7596234.0 89264.0 79249.0 6601.0 26827.0 2 288
The CumulativeRemoteUserCPU is significantly lower than the RemoteUserCpu. Furthermore, the RemoteUserCpu seems to only be accounting the cpu consumed by one of the two nodes (7596234.0/288 ~Â26376). Is this a known issue when using JobUniverse 11?
Thank you in advance for your help!
Cheers,
Carles
-- Carles Acosta i Silva
PIC (Port d'Informacià CientÃfica)
Campus UAB, Edifici D
E-08193 Bellaterra, Barcelona
Tel: +34 93 581 33 08
Fax: +34 93 581 41 10