[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Copying user.cc Kerberos credentials cache inside the sandbox?



You are ever so right Jaime. Indeed I had HTCondor file transfer disabled because I thought that's what will force HTCondor to use shared storage.Â

I've never imagined Kerberos cache being part of the file transfer mechanism my thinking being that if it's such a critical part (access to resources depend on it) it should be sent no matter what but this is a lesson learned: can't access resources conditioned by Kerberos auth if HTC file transfer is disabled.Â

On 1/26/26 9:15 PM, Jaime Frey via HTCondor-users wrote:
The directory /var/lib/condor/execute/dir_130006/scratch/ is created by the condor_starter before starting the job and cleaned up after the job exits (or fails to start). If you have file transfer enabled, itâs also where the transferred files are placed and the Current Working Directory of the job.
I suspect youâre not using HTCondorâs file transfer, and the condor_starter is failing to start your job because it doesnât have the necessary credentials (AFS token). In that case, the time that /var/lib/condor/execute/dir_130006/scratch/ will exist will be very brief.

Try submitting a simple sleep job that doesnât access the shared storage at all, with a submit description file like this:
universe = vanilla
executable = /bin/sleep
arguments = 300
initialdir = /tmp
queue

This job shouldnât require any use of the shared storage and run for enough time for you to examine the /var/lib/condor/execute/dir_XXXXX/scratch/ directory (the number in the path is different for each job execution attempt).

 - Jaime

On Jan 22, 2026, at 12:43âPM, CMV <ciprian.vizitiu@xxxxxxxxxxxxxxx> wrote:

A 25.5 and Kerberos enabled cluster using UidDomain and FileSystemDomain forwards the kerberos ticket to the execution node all right, I can see the ticket being renewed in SEC_CREDENTIAL_DIRECTORY_KRB on the execution node yet, although StarterLog.Slot1_1 claims that

StarterLog.slot1_1:01/21/26 15:37:41 (pid:130006) CREDS: configuring job to use KRB5CCNAME /var/lib/condor/execute/dir_130006/scratch/username.cc

... writing to shared storage won't work and job gets immediately held; if only because the folder /var/lib/condor/execute/dir_130006/ doesn't exist (not to mention /var/lib/condor/execute/dir_130006/scratch/username.cc) so the piece which is supposed to create the scratch folder is not functioning.

Which piece of the puzzle is supposed to create the scratch folder and, more importantly, obtain and write the kerberos credentials cache in there?

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/