On Mar 19, 2013, at 4:47 AM, Michael Hanke <michael.hanke@xxxxxxxxx> wrote:
Sorry, I'm too terse sometimes - Condor is complaining about sandbox cleaning (I think) because it is finding files owned by root in the job sandbox (there are assumptions littered throughout the code, especially sandbox cleanup, that there is only one UID for files in a sandbox; we hit similar issues when using glexec). It sounds like the root-owned files are all from filesystems which are remounted / bind-mounted into the sandbox by pbuilder (/proc, /dev/pts). By enabling MOUNT_UNDER_SCRATCH, HTCondor will put the job in a separate "mount namespace" that makes mounts in the job invisible to the rest of the system; this is required to give the job a private /tmp, but the private /tmp is a side-effect in this case. Hence, /proc and /dev/pts would be invisible to the condor_starter and wouldn't be cleaned up.
Ah - What does CpuBusyTime look like? If there's enough system activity (or if the root-owned processes are not being tracked by the procd and counting as system activity), then the SUSPEND _expression_ could trigger. If it's a dedicated cluster - and you have no need for job suspension - you can set: SUSPEND = FALSE WANT_SUSPEND = FALSE Hope this helps! Brian |
Attachment:
smime.p7s
Description: S/MIME cryptographic signature