[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] condor cgroup hierarchy wrt systemd managed slices



Hi Greg,

This was true in cgroup v1, but not in v2. Why do you want to put the job hierarchy in a separate cgroup, is it to partition memory for the condor daemons?

my main aim would be to squeeze in background/backfill jobs on our cluster and probably use systemd for resource control of the slice(s)

our user story:
We had to drain/reboot our cluster a number of times over the last month to apply new critical kernels. As the workloads of our user groups are somewhat laggy wrt draining/the job run times, we have each time during draining quite some wasted cycles as they are not idling until the last jobs finally finishes. So, we are looking for some low priority backfill jobs which can run during draining periods (with minimal waste cycles, i.e., jobs with well timed run times for matchmaking near to the reboot time or being kill'able with only small loss of work output). Either such jobs could be backfill jobs on maybe a dedicated (partitionable) slot in parallel to the EPs' normal jobs' partitionable slots (or so). Another option might be background jobs proposed by ATLAS, where a workload is constantly running in the background. Such a background workload can scale itself up to all unused available cores/cycles (and has a small memory footprint) - so that such a workload could parasitize any unused cycles scaling up to most of an EP towards the final draining phase.

Now for my cgroup/systemd thing

Such a background/backfill(?) workload should have to have a minimal CPU weight wrt to proper jobs. On many core servers with 256++ cores the potential CPU weight ratios of 100 (1core equivalent) to 1 (background workload) within the Condor cgroup or between the systemd.slice members (for the service) would already be equivalent to >2 cores - which would cut into the nominal Condor weight share when aiming to utilize all cores with Condor jobs. So, at least in the condor job cgroup (not systemd slice) I would aim for a CPU weight ration of 1000 : 1 [all default jobs : background job(s)]. Or maybe a separate startd with the ratio on the system.slice level in the cgroup hierarchy. [1] However, except for manually creating and shuffling around cgroups and PIDs, I do not see a good way to get my ideal(??) weighting ratios. AFAIS the Condor "job" cgroup is created by the master (living itself in condor.service) but is placed in parallel to the other systemd services/slices in system.slice - however is not a "systemd slice" as such, i.e., I can modify it only manually through the virtual fs, but systemctl does not know about it per se. Anyway - if I would reweight the Condor job cgroup, I would have to take care to not antagonize all my other services/slices under system.slice having all the same weight 100 by default (before reweighting). It might be possible (have not tested yet) to create a drop-in for condor.service with a [Slice] section tuning its CPUweight and such - but then again this would dominate the other services/slices under system.slice, so that I would prefer to add another level in the cgroup hierarchy - AFAIS which is hardly possible without screwing Condor :-/

Anyway, might be that I am overthinking the issue, but that's were I am ;)

Cheers,
  Thomas


[1]
cgroup
âââ cgroup
    âââ dev-hugepages.mount [weight: 100]
...
    âââ sys-kernel-tracing.mount [weight: 100]
    âââ system.slice [weight: 100]
...
    â   âââ condor.service [weight: 100]
    â   âââ condordesy.service [weight: 100]
    â   â   âââ condorjob.slice [weight: 100]
â â âââ _var_lib_condor_execute_slot1_10@xxxxxxxxxxxxxxxxxxxxxxx [weight: 800] â â â âââ _var_lib_condor_execute_slot1_10@xxxxxxxxxxxxxxxxxxxxxxx [weight: 100] â â âââ _var_lib_condor_execute_slot1_11@xxxxxxxxxxxxxxxxxxxxxxx [weight: 800]
...
    â   âââ var-cache-cvmfs2.mount [weight: 100]
    â   âââ var.mount [weight: 100]
    âââ user.slice [weight: 100]
        âââ user-0.slice
            âââ session-3368.scope
            âââ user@xxxxxxxxx
...


[2]
systemctl set-property system-condor.slice CPUWeight=1000



Attachment: smime.p7s
Description: S/MIME Cryptographic Signature