HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] condor_preen deleting lock files in use



On 09/23/2011 08:53 AM, Matthew Farrellee wrote:
Cathrin,

--
$ echo 'cmd=/bin/sleep\nargs=1d\nlog=/tmp/log\nqueue' | condor_submit
Submitting job(s).
1 job(s) submitted to cluster 8.

$ condor_q
-- Submitter: matt@xxxxxxxxxxxx : <192.168.1.100:45379> : eeyore.local
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
8.0 matt 9/23 08:37 0+00:00:06 R 0 0.0 sleep 1d
1 jobs; 0 idle, 1 running, 0 held

$ lsof | grep condor | grep lock
condor_ma 3371 matt 6wW REG 253,3 0 1180119
/home/matt/Documents/CondorInstallation/lock/InstanceLock
condor_sh 3569 matt 4u REG 253,1 0 172924
/tmp/condorLocks/49/29/248770184488445.lockc

$ condor_preen

$ lsof | grep condor | grep lock
condor_ma 3371 matt 6wW REG 253,3 0 1180119
/home/matt/Documents/CondorInstallation/lock/InstanceLock
condor_sh 3569 matt 4u REG 253,1 0 172924
/tmp/condorLocks/49/29/248770184488445.lockc (deleted)
--

preen is unconditionally (no -remove needed) deleting lock files in
rec_lock_cleanup, and doing so even when the lock is in use.

We've also had reports of preen core dumping, presumably when it tries
to delete a lock file that has already been freed.

Core was generated by `condor_preen -m -r'.
Program terminated with signal 6, Aborted.
#0 0x0000003c9da32905 in raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
64 return INLINE_SYSCALL (tgkill, 3, pid, selftid, sig);
(gdb) bt
#0 0x0000003c9da32905 in raise (sig=<value optimized out>) at
../nptl/sysdeps/unix/sysv/linux/raise.c:64
#1 0x0000003c9da340e5 in abort () at abort.c:92
#2 0x000000000048d3ab in _EXCEPT_ (fmt=0x4e7140 "FileLock::FileLock():
You must have a valid file path as argument.") at
/usr/src/debug/condor-7.6.1/src/condor_utils/except.cpp:93
#3 0x00000000004ab8dd in FileLock::initLockFile (this=0xf9b3b0,
useLiteralPath=true) at
/usr/src/debug/condor-7.6.1/src/condor_utils/file_lock.cpp:271
#4 0x00000000004abf20 in FileLock::FileLock (this=0xf9b3b0,
path=0xf9beb0 "/var/lock/condor/local///16/42", deleteFile=<value
optimized out>, useLiteralPath=true)
at /usr/src/debug/condor-7.6.1/src/condor_utils/file_lock.cpp:195
#5 0x0000000000444496 in rec_lock_cleanup (path=0xf9beb0
"/var/lock/condor/local///16/42", depth=0, remove_self=true) at
/usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:696
#6 0x000000000044450f in rec_lock_cleanup (path=0xfa1350
"/var/lock/condor/local///16", depth=0, remove_self=true) at
/usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:719
#7 0x000000000044450f in rec_lock_cleanup (path=0xf96af0
"/var/lock/condor/local//", depth=3, remove_self=false) at
/usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:719
#8 0x00000000004446e6 in check_tmp_dir () at
/usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:743
#9 0x000000000044616a in main (argc=<value optimized out>,
argv=0x7fffe6e45840) at
/usr/src/debug/condor-7.6.1/src/condor_tools/preen.cpp:159

Can preen reasonably detect garbage locks and avoid the potential core
dump?

Best,


matt

FYI,

https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2495
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2496
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2497

Best,


matt