Hi Seung-Jin,
The CpusUsage *should* be lower or equal than RequestCpus. However, this is only the case if the job actually only uses the resources it requested! By default, there is no mechanism to restrict a job to just the resources it requested. This is true for both Memory and CPU - if the machine has more resources available, the job can use them.
HTCondor ships with some optional mechanisms to restrict resource usage. For example, CGROUP_MEMORY_LIMIT_POLICY and related settings are effective for setting memory boundaries. You can also use the various SYSTEM_PERIODIC_* settings to create your own rules for stopping misbehaving jobs. For example, we put on hold jobs whose CpusUsage exceeds their RequestCpus by a small grace factor.
Cheers, Max On 11. Jul 2024, at 00:40, Seung-Jin Sul <ssul@xxxxxxx> wrote:
Hi Jaime,
Can I ask questions?
1. We are using the equation you suggested to calculate the CpusUsage (=cpu utilization).
The task requests 2 cpus and we can calculate the CpusUsage like
RequestCpus = 2 (RemoteSysCpu + RemoteUserCpu) / CommittedTime (180.0 + 212.0) /25 =15.68
How can we interpret the values? How come (RemoteSysCpu + RemoteUserCpu) is bigger than `CommittedTime`?
is CpusUsage supposed to be equal to or lower than the `RequestCpus`?
2. We have another question on `MemoryUsage`. We have the below numbers
condor_history -l 1129923 | grep ResidentSetSize MemoryUsage = ((ResidentSetSize + 1023) / 1024) ResidentSetSize = 7500000 RequestMemory = 2048MB ((7500000 + 1023) / 1024) = 7325.2 MB
The question is how come a task can use 7325MB even the `RequestMemory` is 2048MB. Please note that we also have a policy like
``` Requirements = (isUndefined(TARGET.AliveUntil) ? true : TARGET.AliveUntil > time() + (0 * 60)) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer)) ```
Any comments will be appreciated.
Thank you.
Best regards,
Seung-Jin Sul, Ph.D.pronouns: he / his Staff Software Developer, Tech LeadAdvanced Analysis Group DOE Joint Genome Institute Lawrence Berkeley National Lab On Mon, Apr 1, 2024 at 8:27âAM Seung-Jin Sul < ssul@xxxxxxx> wrote: Thank you so much for the info, Jaime.
CpusUsage is calculated periodically by the condor_startd and communicated to the condor_starter, which includes it in updates to the condor_shadow on the Access Point. I believe it can be undefined for very short jobs.
RemoteSysCpu and RemoteUserCpu are obtained from the OSâs rusage data when the job exits, and thus are more reliably reported.
For a completed job that had one execution attempt, the value (RemoteSysCpu + RemoteUserCpu) / CommittedTime should be close to CpusUsage.
Two things to note that can throw off any computations:
* CommittedTime and RemoteWallClockTime include the time to transfer input files before the job starts.
* Some of these attributes (e.g. RemoteWallClockTime) are a total across multiple execution attempts.
- Jaime
> On Mar 26, 2024, at 2:01âPM, Seung-Jin Sul <ssul@xxxxxxx> wrote:
>
> Hi,
> We are parsing out the history log file to extract the cpu usage data. We think the `cpusuage` value is very useful but we found the `cpususage` line is missing for some tasks. Could someone explain why the line is missing in some cases?
>
> And what could be the best way to calculate the cpu utilization if we can't get the `cpusuage`? The values we can use are (again cpususage may not exist),
> ```
> CommittedSuspensionTime = 0
> CommittedTime = 1071
> (CpusUsage = 1.185198573018748)
> RemoteSysCpu = 1242.0
> RemoteUserCpu = 4793.0
> RemoteWallClockTime = 1071.0
> RequestCpus = 16
> ```
>
> Thank you!
> Seung
>
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
|