Hi all, has somebody experiences with freezing whole Condor process trees for node maintenance...? background: we had to do a bit of hand work on a few jobs, where mounts required a bit of attentions. On the way it seemed to be nice to freeze the jobs, so to be able to work on mounts without affecting jobs, i.e., if a mount disappears for a moment. Playing on a test node, it seems that one can add the Condor process tree to a freezer cgroup and hibernate it for some time without affecting the daemons health (provided that the freeze is sufficiently short not to be assumed dead by the collector) But maybe somebody has already experiences if it works for real-life scenarios with user jobs, which might be more sensible to freeze, or how the system reacts if a full node reappears with all jobs after being absent for too long (and jobs got already resubmitted)? Ideally, it would be nice to have frozen processes to survive a reboot, but so far my attempts with CRIU [https://criu.org] where not very successful (probably it works better with binaries than shell scripts started in an active session...?) Cheers, Thomas [1] > mkdir /sys/fs/cgroup/freezer/mycondorfreeze/ > while read X; do echo ${X} >> /sys/fs/cgroup/freezer/mycondorfreeze/tasks; done < /sys/fs/cgroup/memory/system.slice/condor.service/tasks > cat /sys/fs/cgroup/freezer/mycondorfreeze/freezer.state THAWED > echo FROZEN > /sys/fs/cgroup/freezer/mycondorfreeze/freezer.state ...wait... > echo THAWED > /sys/fs/cgroup/freezer/mycondorfreeze/freezer.state
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature