HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] condor_preen deleting lock files in use



Cathrin,

--
$ echo 'cmd=/bin/sleep\nargs=1d\nlog=/tmp/log\nqueue' | condor_submit
Submitting job(s).
1 job(s) submitted to cluster 8.

$ condor_q
-- Submitter: matt@xxxxxxxxxxxx : <192.168.1.100:45379> : eeyore.local
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD 8.0 matt 9/23 08:37 0+00:00:06 R 0 0.0 sleep 1d
1 jobs; 0 idle, 1 running, 0 held

$ lsof | grep condor | grep lock
condor_ma 3371 matt 6wW REG 253,3 0 1180119 /home/matt/Documents/CondorInstallation/lock/InstanceLock condor_sh 3569 matt 4u REG 253,1 0 172924 /tmp/condorLocks/49/29/248770184488445.lockc

$ condor_preen

$ lsof | grep condor | grep lock
condor_ma 3371 matt 6wW REG 253,3 0 1180119 /home/matt/Documents/CondorInstallation/lock/InstanceLock condor_sh 3569 matt 4u REG 253,1 0 172924 /tmp/condorLocks/49/29/248770184488445.lockc (deleted)
--

preen is unconditionally (no -remove needed) deleting lock files in rec_lock_cleanup, and doing so even when the lock is in use.

We've also had reports of preen core dumping, presumably when it tries to delete a lock file that has already been freed.

Core was generated by `condor_preen -m -r'.
Program terminated with signal 6, Aborted.
#0 0x0000003c9da32905 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
64        return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x0000003c9da32905 in raise (sig=<value optimized out>) at ../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1  0x0000003c9da340e5 in abort () at abort.c:92
#2 0x000000000048d3ab in _EXCEPT_ (fmt=0x4e7140 "FileLock::FileLock(): You must have a valid file path as argument.") at /usr/src/debug/condor-7.6.1/src/condor_utils/except.cpp:93 #3 0x00000000004ab8dd in FileLock::initLockFile (this=0xf9b3b0, useLiteralPath=true) at /usr/src/debug/condor-7.6.1/src/condor_utils/file_lock.cpp:271 #4 0x00000000004abf20 in FileLock::FileLock (this=0xf9b3b0, path=0xf9beb0 "/var/lock/condor/local///16/42", deleteFile=<value optimized out>, useLiteralPath=true)
    at /usr/src/debug/condor-7.6.1/src/condor_utils/file_lock.cpp:195
#5 0x0000000000444496 in rec_lock_cleanup (path=0xf9beb0 "/var/lock/condor/local///16/42", depth=0, remove_self=true) at /usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:696 #6 0x000000000044450f in rec_lock_cleanup (path=0xfa1350 "/var/lock/condor/local///16", depth=0, remove_self=true) at /usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:719 #7 0x000000000044450f in rec_lock_cleanup (path=0xf96af0 "/var/lock/condor/local//", depth=3, remove_self=false) at /usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:719 #8 0x00000000004446e6 in check_tmp_dir () at /usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:743 #9 0x000000000044616a in main (argc=<value optimized out>, argv=0x7fffe6e45840) at /usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:159

Can preen reasonably detect garbage locks and avoid the potential core dump?

Best,


matt