HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] New-style locking



Brian,

I am actually glad to receive some feedback on that. The ticket related to those changes can be found on https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=1310 ; this will give you at least some insight what's going on.

The high-level idea behind those changes: We encountered quite some problems with locking on shared file systems and wanted to make sure that files will be locked on the local file system.

The "randomness" is explained by the file name of the file to be locked being hashed, then the first 4 digits form two subdirectories, followed by a directory named after the user id of the file owner, followed by the residual hash value digits as file name.

Why it is done that way: We wanted to prevent the creation of too many files/directories in one subdirectory, i.e. that's why we wanted the uid to be the last directory instance

The uid being currently always the user running the daemon, that is a bug and needs to be fixed - agreed.

Commenting on Matt's message that arrived while I was typing this message: /tmp/condorLocks is *not* hard-coded. It takes first into accout the TMP_DIR or TEMP_DIR config entries.

-- Cathrin


On 12/17/2010 09:54 AM, Brian Bockelman wrote:
Hi folks,

Can someone explain the new-style locking to me (or at least point me to the design document)?  I just upgraded to a bleeding-edge Condor and new-style locking started to be used for the first time.

I saw lots of files being created in /tmp/condorLocks (which is inappropriate, that's what /var/lock/condor is for).  Inside it, there are hundreds of files (after running the schedd for a few minutes, 2000+ total entries in the directory tree) and a lot of directories.  The files are owned by many users and the directories (again, owned by many users) are world-writeable.

Note - it would be very helpful if you could organize the locks by owner, instead of having an apparently random scheme.  The semantics are probably identical, but it'll help sysadmins understand what's happening.

I see a lot of errors in the ScheddLog along the lines of this:

12/17/10 09:45:11 (pid:5168) directory_util::rec_touch_file: File /tmp/condorLocks/29/85/0/285458.lockc cannot be created (Permission denied)
12/17/10 09:45:11 (pid:5168) directory_util::rec_touch_file: File /tmp/condorLocks/29/85/0/285458.lockc cannot be created (Permission denied)
12/17/10 09:45:11 (pid:5168) FileLock::FileLock: File locks cannot be created on local disk - will fall back on locking the actual file.
12/17/10 09:45:11 (pid:5168) Warning: Failed to open event rotation lock file /var/log/condor/EventLog.lock: 13 (Permission denied)

I don't know what EUID is being used to create the lock file, so I don't know whether the Permission Denied errors are appropriate.  The EventLog.lock issues aren't new, but the directory_util::rec_touch_file lines are new.  I think the EventLog.lock has always been rotated with the wrong permissions in the most recent versions of Condor.

So, there's lots of things happening, quite a few errors in the logs, but it appears the system is working.  I would appreciate whatever background folks can provide.