Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] nodes without cgrouped jobs?

Date: Thu, 21 Jun 2018 16:06:32 +0200
From: Thomas Hartmann <thomas.hartmann@xxxxxxx>
Subject: Re: [HTCondor-users] nodes without cgrouped jobs?

Hi again,

I think I managed manually to replicate the disappearance of the jobs'
cgroups at least partially. It looks like to be due to an issue with
systemd/a typo... [*]

  ~~> Condor is innocent

Cheers and sorry for the noise!
  Thomas


ps: unfortunately, I do not completely understand the behaviour and
would appreciate any ideas from systemd experts ;) [**]

[*]
- we are distributing a systemd unit via puppet, which starts a
Singularity container/runscript (that binds the root path internally)
  ExecStart=/usr/bin/singularity run --bind /:/rootfs:ro
/path/to/container.d
- when distributed/updated on(to) a node, puppet would trigger a
  systemctl daemon-reload
- and would ensure the service to be active
- due to a bug (~>forgotten variable), the unit's template might contain
a condition dangling in the air, i.e.,
  >>
  [Unit]
  Description=foofoobar
  ConditionPathExists=/path/to/container.d
  ConditionPathExists=
  [Service]
  ExecStart=/usr/bin/singularity run --bind /:/rootfs:ro
/path/to/container.d
  ...
  >>
- when this (apparently defective) unit got started (and ensured by
puppet...), the existing job slices in the cpu and memory controllers
got wiped out!? (the condor.service parent slices survived the unit start)
- with a fixed unit, the job slices survive (re)starts of the service!

[**]
- what I do not fully understand is why/how the processes loose their
cgroup slices or why/how systemd/kernel does it?? The PIDs are
unaffected - so I would have naively assumed, that once assigned to a
cgroup, a process would stay there. But apparently the cgroups get
removed(?) and the PIDs appended to the next parent group(?)

- the slices get only wiped-out when the exec is started through
systemd. And I have not been able to reproduce the behaviour taking each
step manually.

- what kind of namespace view does systemd has? I see systemd processes
belonging to PPID=1 as well as PPID=0(!?)

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

Follow-Ups:
- [HTCondor-users] systemd interfering with Condor job cgroups
  - From: Thomas Hartmann

References:
- [HTCondor-users] nodes without cgrouped jobs?
  - From: Thomas Hartmann
- Re: [HTCondor-users] nodes without cgrouped jobs?
  - From: Todd Tannenbaum
- Re: [HTCondor-users] nodes without cgrouped jobs?
  - From: Thomas Hartmann
- Re: [HTCondor-users] nodes without cgrouped jobs?
  - From: Todd Tannenbaum
- Re: [HTCondor-users] nodes without cgrouped jobs?
  - From: Thomas Hartmann
- Re: [HTCondor-users] nodes without cgrouped jobs?
  - From: Thomas Hartmann

Prev by Date: Re: [HTCondor-users] nodes without cgrouped jobs?
Next by Date: Re: [HTCondor-users] Looking for negotiator optimization setting
Previous by thread: Re: [HTCondor-users] nodes without cgrouped jobs?
Next by thread: [HTCondor-users] systemd interfering with Condor job cgroups
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] nodes without cgrouped jobs?