Re: [HTCondor-users] Windows dedicated run account profile corrupted

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Do you see the corrupted user profiles on both Windows 8 and Windows 10 ? or just on one or there other of those platforms?

We saw something similar to what you are describing many years ago on one of the nodes in our build farm. It was at least 5 years ago, and that node had a failing disk, so we chalked it up at the time to the failing disk. I think I remember that the node was Windows 8.1, but it was so long ago I cannot be sure.

It is certainly plausible that the cleanup of a user profile would fail if we tried to do it while a process using the profile was still running. It is HTCondor’s responsibility to stop all processes started by a job when the job exits, so it is reasonable to consider this a HTCondor bug, but I don’t have any idea how to fix it. Best we could manage would be to detect the left behind user directory and report it. Do have HTCondor configured to send email to an admin when things go wrong like a daemon crash?)

> Does the above mean that when we want to rely on the dedicated run account, the submit configuration knob "load_profile = True" is redundant?

That means that load_profile=true is *available* with the dedicated run account, but HTCondor will not actually load a registry hive unless the job requests it.

Besides using a dedicated run account, the other option is run_as_owner=true, which only works if you have an account for the submitting user on the execute node. run_as_owner will always load the registry hive for that user.

From: John M Knoeller <johnkn@xxxxxxxxxxx>
Sent: Wednesday, August 2, 2023 5:15 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: O'NEAL Mark <mark.oneal@xxxxxxxxxxx>
Subject: RE: Windows dedicated run account profile corrupted

This email is not from Hexagon’s Office 365 instance. Please be careful while clicking links, opening attachments, or replying to this email.

This is not a known HTCondor issue.

I wonder if restarting Windows could clean up the user directories and registries that had been left behind?

-tj

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of O'NEAL Mark via HTCondor-users
Sent: Tuesday, August 1, 2023 7:52 PM
To: htcondor-users@xxxxxxxxxxx
Cc: O'NEAL Mark <mark.oneal@xxxxxxxxxxx>
Subject: [HTCondor-users] Windows dedicated run account profile corrupted

Hello,

We operate an HTCondor cluster under Windows utilizing the "load_profile = True" submit configuration macro and rely on the dedicated run accounts provisioned by the condor_startd running as Windows SYSTEM user. Compute nodes running startd are a mix of Windows 8 and 10 running HTCondor 8.8.10, and are configured with static slot definitions.

Our IT manager recently noted that the dedicated run account profile cleanup which normally happens during job shutdown has been disrupted at some point in time on a number of these nodes, evidenced by:

profile folder in C:\Users (i.e. C:\Users\condor-slot1) is not deleted and appears corrupted. Windows behavior kicks in next time the startd tries to create the dedicated run account, generating C:\Users\condor-slot1.hostname as a fallback
registry hive for the user condor-slot1 is not deleted

I've checked the StarterLog for a number of the slots, most show success to load the registry hive even when the issue described above is observed for that slot. There were some which did report failure loading the registry hive in the Starter log.

I've done some research on the open web and haven't identified any hints where to look thus far. I would appreciate if any one on the mailing list has suggestions where to start with log investigation or configuration setting. We run the cluster for LAN use only behind our firewall, so have not seen a significant motivation to upgrade into the 9.x or 10.x releases. If this were a known issue with older versions it would be a reasonable motivation to take the upgrade plunge though.

Best Regards,
Mark

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Windows dedicated run account profile corrupted