Do you see the corrupted user profiles on both Windows 8 and Windows 10 ? or just on one or there other of those platforms? We saw something similar to what you are describing many years ago on one of the nodes in our build farm. It was at least 5 years ago, and that node had a failing disk, so we chalked it up at the time to the failing disk. I think I remember
that the node was Windows 8.1, but it was so long ago I cannot be sure. It is certainly plausible that the cleanup of a user profile would fail if we tried to do it while a process using the profile was still running. It is HTCondor’s responsibility to stop all processes started by a job when the job exits,
so it is reasonable to consider this a HTCondor bug, but I don’t have any idea how to fix it. Best we could manage would be to detect the left behind user directory and report it. Do have HTCondor configured to send email to an admin when things go wrong
like a daemon crash?) > Does the above mean that when we want to rely on the dedicated run account, the submit configuration knob "load_profile = True" is redundant? Besides using a dedicated run account, the other option is run_as_owner=true, which only works if you have an account for the submitting user on the execute node. run_as_owner will always load the registry hive for that user.
From: John M Knoeller <johnkn@xxxxxxxxxxx> This email is not from Hexagon’s Office 365 instance. Please be careful while clicking links, opening attachments, or replying to this email. This is not a known HTCondor issue. I wonder if restarting Windows could clean up the user directories and registries that had been left behind? -tj From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of O'NEAL Mark via HTCondor-users Hello, We operate an HTCondor cluster under Windows utilizing the "load_profile = True" submit configuration macro and rely on the dedicated run accounts provisioned by the condor_startd running as Windows SYSTEM user. Compute
nodes running startd are a mix of Windows 8 and 10 running HTCondor 8.8.10, and are configured with static slot definitions. Our IT manager recently noted that the dedicated run account profile cleanup which normally happens during job shutdown has been disrupted at some point in time on a number of these nodes, evidenced by:
I've checked the StarterLog for a number of the slots, most show success to load the registry hive even when the issue described above is observed for that slot. There were some which did report
failure loading the registry hive in the Starter log. I've done some research on the open web and haven't identified any hints where to look thus far. I would appreciate if any one on the mailing list has suggestions where to start with log investigation
or configuration setting. We run the cluster for LAN use only behind our firewall, so have not seen a significant motivation to upgrade into the 9.x or 10.x releases. If this were a known issue with older versions it would be a reasonable motivation to take
the upgrade plunge though. Best Regards, |