Partly an FYI, but also a question. We have recently implemented the encryption of the execute directory on windows nodes by getting the windows submit nodes to set it for all submitted jobs: # set all jobs to encrypt the execute directory on execute nodes. JOB_TRANSFORM_NAMES = $(JOB_TRANSFORM_NAMES) Encrypt JOB_TRANSFORM_Encrypt @=end #REQUIREMENTS universe =?= vanilla SET EncryptExecuteDirectory = true # optionally also force match to nodes that can encrypt. (not all Linux nodes can encrypt) #SET Requirements = ( $(MY.Requirements) ) && TARGET.HasEncryptExecuteDirectory
@end # Do not allow users to edit the value of EncryptExecuteDirectory after submission # via tools like condor_qedit or chirp. IMMUTABLE_JOB_ATTRS = $(IMMUTABLE_JOB_ATTRS) EncryptExecuteDirectory We (well more accurately, âmeâ
😉) had done some testing, but unfortunately not enough. Sigh. We have not yet implemented pool passwords and a credd server yet so jobs are not running as the owner but with the dynamically created condor-slot, etc. users by HTCondor itself. This has implications for most of our users âjobsâ which are actually batch files. The generic structure is usually something like: map a network drive to the userâs fileserver download zipped software binaries download input data file/s unzip software binaries run software with the input data file/s upload output data file/s to the userâs fileserver disconnect network drive Where the problem/error occurs is uploading the output file/s, where we get a âThe specified file could not be encryptedâ message. Which, in hindsight, I think? makes sense as it is encrypted by user condor-slot1 and then trying to copy to a location which only the ârealâ user has permissions to, so will cause problems. One kludge around is to use the âcipherâ command to decrypt the file before uploading it, e.g. software.exe > outputfile.dat cipher /d /b /h outputfile.dat > nul 2>&1 copy outputfile.dat
\\fileserver\user\output The other alternate kludge is to redirect the output directly to the fileserver, bypassing it being encrypted on the local execute node. It may not always be possible to do this though, depending on how the software is creating itâs output data file/s. software.exe > \\fileserver\user\output\outputfile.dat So thatâs the FYI bit, and once users can run_as_owner I donât think this shouldnât be a problem? Now for the question part. The above kludges mostly work, but there is still a small percentage (3%) of jobs, e.g. 150 out of 5,000 that give errors like: 120813.3 na-hit023 9/13 11:03 Error from
slot1@xxxxxxxxxxx: STARTER at 152.83.xxx.xxx failed to send file(s) to <152.83.yyy.yyy:62198>: error reading from C:\PROGRA~1\condor\execute\dir_2356\_condor_stderr: (errno 13) Permission denied; SHADOW failed to receive file(s) from <152.83.xxx.xxx:50880> 120813.210 na-hit023 9/13 11:20 Error from
slot3@xxxxxxxxxxx: STARTER at 138.194.aaa.aaa failed to write to file C:\PROGRA~1\condor\execute\dir_15600\condor_exec.exe: (errno 13) Permission denied 120813.579 na-hit023 9/13 11:16 Error from
slot31@xxxxxxxxxxx: Failed to open 'C:\PROGRA~1\condor\execute\dir_26868\_condor_stdout' as standard output: Permission denied (errno 13) 120813.675 na-hit023 9/13 11:00 Error from
slot9@xxxxxxxxxxx: Failed to open 'C:\PROGRA~1\condor\execute\dir_33056\_condor_stderr' as standard error: Permission denied (errno 13) 120813.755 na-hit023 9/13 11:00 Error from
slot16@xxxxxxxxxxx: STARTER at 152.83.bbb.bbb failed to write to file C:\PROGRA~1\condor\execute\dir_22412\condor_exec.exe: (errno 13) Permission denied These must be related to the encrypt_execute_directory stuff because we can re-run the jobs with NO execute directory encryption enabled and do not get these errors. Again, we can kludge around them using something like: periodic_release = (JobStatus == 5) && ((HoldReasonCode == 12) || (HoldReasonCode == 13)) So I guess the question is does anyone have any ideas as to why these errors are occurring? And only when encryptexecutedirectory is set to true? Thanks for any help/ideas/comments. Cheers Greg |