[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Problem with suspension, job wrapper and cgroups



Hello,

I observed a problem with job suspension and cgroups-v2 on AlmaLinux 9.4 (kernel 5.14.0-427.16.1.el9_4) with HTCondor 23.0.8 (and other versions as well).

The idea is that jobs should be suspended if there is keyboard/console activity. In principal, this works. At least, HTConder says so if one looks into condor_status or the job's log file. In many cases, I saw the sub-processes of jobs running, even if HTCondor said that the job was suspended.

I was able to identify the cause of the problem. In our case, we use a simple job wrapper:

#!/bin/bash
/bin/nice -n 19 "$@"

A typical process structure looks like this:

USER COMMAND CGROUP
condorÂÂÂÂ \_ condor_startd 0::/system.slice/condor.service
condorÂÂÂÂÂÂÂÂ \_ condor_starter -f -local-name slot_type_1 -a slot1_3 XXXXXX.physik.rwth-aachen.de 0::/system.slice/condor.service userÂÂÂÂ Â Â Â |ÂÂ \_ /bin/bash /etc/condor/user_job_wrapper XXXXXXX 0::/htcondor/condor_user_condor_execute_slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx
user   Â | \_ /bin/sh XXXXXX 0::/system.slice/condor.service
user      | \_ XXXXXXX 0::/system.slice/condor.service
user   Â | \_ XXXXXXX 0::/system.slice/condor.service

Since only the job wrapper belongs to the cgroup of the job slot, HTCondor is not able to suspend the sub-processes as well.


Without the job wrapper, the structure looks like this:

USER COMMAND CGROUP
condorÂÂÂÂ \_ condor_startd 0::/system.slice/condor.service
condor \_ condor_starter -f -local-name slot_type_1 -a slot1_3 XXXXXX.physik.rwth-aachen.de 0::/system.slice/condor.service user   Â | \_ /bin/sh XXXXXX 0::/htcondor/condor_user_condor_execute_slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx user      |   \_ XXXXXXX 0::/htcondor/condor_user_condor_execute_slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx user   Â |    \_ XXXXXXX 0::/htcondor/condor_user_condor_execute_slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx

With these cgroup assignments, job suspension works fine.


While I was debugging the behavior of the job wrapper, I found out that the job wrapper is started with the cgroup "/system.slice/condor.service" which is changed to "/htcondor/condor_user_condor_execute_slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx" within the first second. Obviously, there is a race condition: If "nice" is executed too early, the sub-process inherits the wrong cgroup assignment.

Adding a simple "sleep 1" at the beginning of the job wrapper avoids the problem. In the end, I use this modified job wrapper:

#!/bin/bash
while /usr/bin/grep -q -v ":/htcondor/" /proc/self/cgroup ; do
 /usr/bin/sleep 1
done
/bin/nice -n 19 "$@"

This results in the correct cgroup assignments:

USER COMMAND CGROUP
condorÂÂÂÂ \_ condor_startd 0::/system.slice/condor.service
condor \_ condor_starter -f -local-name slot_type_1 -a slot1_3 XXXXXX.physik.rwth-aachen.de 0::/system.slice/condor.service user    | \_ /bin/bash /etc/condor/user_job_wrapper XXXXXXX 0::/htcondor/condor_user_condor_execute_slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx user   Â | \_ /bin/sh XXXXXX 0::/htcondor/condor_user_condor_execute_slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx user      | \_ XXXXXXX 0::/htcondor/condor_user_condor_execute_slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx user   Â | \_ XXXXXXX 0::/htcondor/condor_user_condor_execute_slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxx


Is this behavior already known? Is it due to HTCondor or due to the kernel?


By the way, the job's log file does not show the correct number of suspended processes regardless of the cgroup assignments:

010 (051.000.000) 2024-06-07 11:42:53 Job was suspended.
ÂÂÂÂÂÂÂ Number of processes actually suspended: 0

Best regards,
  Andreas

------------------------------------------------------------------------
  Dr. Andreas Nowack               email: nowack@xxxxxxxxxxxxxxxxxxxxx
  RWTH Aachen
  III. Phys. Institut B
  Sommerfeldstr. / Physikzentrum   phone: +49 241 80-27282
  D-52056 Aachen                     fax: +49 241 80-22244
  Germany

Attachment: smime.p7s
Description: Kryptografische S/MIME-Signatur