[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Missing `CpusUsage` line in the history



Hi Seung-Jin,

The CpusUsage *should* be lower or equal than RequestCpus. However, this is only the case if the job actually only uses the resources it requested!
By default, there is no mechanism to restrict a job to just the resources it requested. This is true for both Memory and CPU - if the machine has more resources available, the job can use them.

HTCondor ships with some optional mechanisms to restrict resource usage. For example, CGROUP_MEMORY_LIMIT_POLICY and related settings are effective for setting memory boundaries.
You can also use the various SYSTEM_PERIODIC_* settings to create your own rules for stopping misbehaving jobs. For example, we put on hold jobs whose CpusUsage exceeds their RequestCpus by a small grace factor.

Cheers,
Max

On 11. Jul 2024, at 00:40, Seung-Jin Sul <ssul@xxxxxxx> wrote:

Hi Jaime,

Can I ask questions?

1. We are using the equation you suggested to calculate the CpusUsage (=cpu utilization).
The task requests 2 cpus and we can calculate the CpusUsage like


RequestCpus = 2
(RemoteSysCpu + RemoteUserCpu) / CommittedTime
(180.0 + 212.0) /25 =15.68

How can we interpret the values? How come (RemoteSysCpu + RemoteUserCpu) is bigger than `CommittedTime`?
is CpusUsage supposed to be equal to or lower than the `RequestCpus`?

2. We have another question on `MemoryUsage`.
We have the below numbers

condor_history -l 1129923 | grep ResidentSetSize
MemoryUsage = ((ResidentSetSize + 1023) / 1024)
ResidentSetSize = 7500000
RequestMemory = 2048MB
((7500000 + 1023) / 1024) = 7325.2 MB

The question is how come a task can use 7325MB even the `RequestMemory` is 2048MB.
Please note that we also have a policy like

```
Requirements = (isUndefined(TARGET.AliveUntil) ? true : TARGET.AliveUntil > time() + (0 * 60)) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))
```

Any comments will be appreciated.

Thank you.

Best regards,




Seung-Jin Sul, Ph.D.
pronouns: he / his
510-495-8456     |    ssul@xxxxxxx
Staff Software Developer, Tech Lead
Advanced Analysis Group
DOE Joint Genome Institute
Lawrence Berkeley National Lab


On Mon, Apr 1, 2024 at 8:27âAM Seung-Jin Sul <ssul@xxxxxxx> wrote:
Thank you so much for the info, Jaime.

On Mon, Apr 1, 2024, 8:17âAM Jaime Frey via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:
CpusUsage is calculated periodically by the condor_startd and communicated to the condor_starter, which includes it in updates to the condor_shadow on the Access Point. I believe it can be undefined for very short jobs.
RemoteSysCpu and RemoteUserCpu are obtained from the OSâs rusage data when the job exits, and thus are more reliably reported.

For a completed job that had one execution attempt, the value (RemoteSysCpu + RemoteUserCpu) / CommittedTime should be close to CpusUsage.

Two things to note that can throw off any computations:
* CommittedTime and RemoteWallClockTime include the time to transfer input files before the job starts.
* Some of these attributes (e.g. RemoteWallClockTime) are a total across multiple execution attempts.

 - Jaime

> On Mar 26, 2024, at 2:01âPM, Seung-Jin Sul <ssul@xxxxxxx> wrote:
>
> Hi,
> We are parsing out the history log file to extract the cpu usage data. We think the `cpusuage` value is very useful but we found the `cpususage` line is missing for some tasks. Could someone explain why the line is missing in some cases?
>
> And what could be the best way to calculate the cpu utilization if we can't get the `cpusuage`? The values we can use are (again cpususage may not exist),
> ```
> CommittedSuspensionTime = 0
> CommittedTime = 1071
> (CpusUsage = 1.185198573018748)
> RemoteSysCpu = 1242.0
> RemoteUserCpu = 4793.0
> RemoteWallClockTime = 1071.0
> RequestCpus = 16
> ```
>
> Thank you!
> Seung
>


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME cryptographic signature