[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Periodic Hold for jobs exceeding memory and CPU requests



Hi David,

I personally do not worry too much about core request and usage. With cgroups in use, a job trying to use more CPU than requested, it should be confined to the assigned weightened CPU time share. E.g., a job requesting 2 cores on a 48 core host should get ~4% of the overall CPU time - even when trying to sneak in a `make -j 48`

Cheers,
  Thomas

On 24/02/2021 13.31, David Cohen wrote:
Hi,
I was under the, apparently wrong, impression that setting
CGROUP_MEMORY_LIMIT_POLICY = HARD
will suffice to kill jobs running over the requested memory.
I now understand that I have to back it up by a SYSTEM_PERIODIC_HOLD
As the system is in production I don't want to risk getting it wrong and killing innocent jobs.

While I'm at it can I also use that method to remove jobs that are using more cores than requested (cpu usage > cpu requested)?

Thanks,
David




_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/


Attachment: smime.p7s
Description: S/MIME Cryptographic Signature