Hi Seung-Jin, just a quick idea - but how full are you execution points?If you have unclaimed cycles on a worker, it might be that the job (when having more threads or so than the requested core count weight) is jumping on the free cpu times.
For the RSS I would check if the job has a lot of shared pages (or maybe cached pages). If your cluster does not enforce strict memory limits, than a job might get allowed to go beyond the requested limits (provided that there is sufficient unclaimed memory available) - but might live with the risk, that it gets killed when memory gets close.
Cheers, Thomas On 11/07/2024 00.40, Seung-Jin Sul wrote:
Hi Jaime, Can I ask questions?1. We are using the equation you suggested to calculate the CpusUsage (=cpu utilization).The task requests 2 cpus and we can calculate the CpusUsage like RequestCpus = 2 (RemoteSysCpu + RemoteUserCpu) / CommittedTime (180.0 + 212.0) /25 =15.68How can we interpret the values? How come (RemoteSysCpu + RemoteUserCpu) is bigger than `CommittedTime`?is CpusUsage supposed to be equal to or lower than the `RequestCpus`? 2. We have another question on `MemoryUsage`. We have the below numbers condor_history -l 1129923 | grep ResidentSetSize MemoryUsage = ((ResidentSetSize + 1023) / 1024) ResidentSetSize = 7500000 RequestMemory = 2048MB ((7500000 + 1023) / 1024) = 7325.2 MBThe question is how come a task can use 7325MB even the `RequestMemory` is 2048MB.Please note that we also have a policy like ```Requirements = (isUndefined(TARGET.AliveUntil) ? true : TARGET.AliveUntil > time() + (0 * 60)) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))``` Any comments will be appreciated. Thank you. Best regards, Seung-Jin Sul, Ph.D. pronouns: he / his 510-495-8456ÂÂ Â | ssul@xxxxxxx <mailto:ssul@xxxxxxx> Staff Software Developer, Tech Lead Advanced Analysis Group DOE Joint Genome Institute Lawrence Berkeley National LabOn Mon, Apr 1, 2024 at 8:27âAM Seung-Jin Sul <ssul@xxxxxxx <mailto:ssul@xxxxxxx>> wrote:Thank you so much for the info, Jaime. On Mon, Apr 1, 2024, 8:17âAM Jaime Frey via HTCondor-users <htcondor-users@xxxxxxxxxxx <mailto:htcondor-users@xxxxxxxxxxx>> wrote: CpusUsage is calculated periodically by the condor_startd and communicated to the condor_starter, which includes it in updates to the condor_shadow on the Access Point. I believe it can be undefined for very short jobs. RemoteSysCpu and RemoteUserCpu are obtained from the OSâs rusage data when the job exits, and thus are more reliably reported. For a completed job that had one execution attempt, the value (RemoteSysCpu + RemoteUserCpu) / CommittedTime should be close to CpusUsage. Two things to note that can throw off any computations: * CommittedTime and RemoteWallClockTime include the time to transfer input files before the job starts. * Some of these attributes (e.g. RemoteWallClockTime) are a total across multiple execution attempts. Â- Jaime > On Mar 26, 2024, at 2:01âPM, Seung-Jin Sul <ssul@xxxxxxx <mailto:ssul@xxxxxxx>> wrote: > > Hi, > We are parsing out the history log file to extract the cpu usage data. We think the `cpusuage` value is very useful but we found the `cpususage` line is missing for some tasks. Could someone explain why the line is missing in some cases? > > And what could be the best way to calculate the cpu utilization if we can't get the `cpusuage`? The values we can use are (again cpususage may not exist), > ``` > CommittedSuspensionTime = 0 > CommittedTime = 1071 > (CpusUsage = 1.185198573018748) > RemoteSysCpu = 1242.0 > RemoteUserCpu = 4793.0 > RemoteWallClockTime = 1071.0 > RequestCpus = 16 > ``` > > Thank you! > Seung > _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx <mailto:htcondor-users-request@xxxxxxxxxxx> with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users> The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ <https://lists.cs.wisc.edu/archive/htcondor-users/> _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature