HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] Bug report: log file rotation and COLLECTOR_QUERY_WORKERS



If COLLECTOR_QUERY_WORKERS is some positive number, there is a risk of
the log file rotation code killing the collector.  The risk is greatly
increased with a small log file size (lots of rotations) and a high
debugging output, like D_ALL.  

Here's what happens.  See preserve_log_file().

1) unlink ( CollectorLog.old )
2) link ( CollectorLog, CollectorLog.old )
3) unlink ( CollectorLog )
4) stat ( CollectorLog ) to see if CollectorLog exists.

The parent collector gets to step #3 and is switched out.  A child
collector wants to write to the (same) log file, sees that it doesn't
exist, and creates it.  The parent comes back in, does step #4, and dies
with:

sprintf( msg_buf, "unlink(%s) succeeded but file still exists!",
         DebugFile[debug_level] );
_condor_dprintf_exit( save_errno, msg_buf );

For a real-life example, please see

http://docs.optena.com/display/CONDOR/2005/10/04/Collector+Log+Rotation

I believe that the extra stat(), while nice and paranoid, isn't needed.
If you really wanted to keep it and be both paranoid and correct, you
could stat() the log file beforehand and compare their inode numbers.

Have a nice day,
-Mike