HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Stopping condor fails with stale PID file [PATCH]



On Mar 9, 2012, at 2:35 PM, Jaime Frey wrote:

> On Mar 6, 2012, at 6:22 AM, Michael Hanke wrote:
> 
>> on a system with a stale PID file (no condor_master running, e.g. after
>> a crash of the master) the init script's stop action fails, because it
>> waits for a non-existing process to end. This behavior can cause, for
>> example, a Debian package upgrade to fail. The attached patch addresses
>> this problem. I'd be glad if you could have a look at it and let me know
>> whether there are undesired side-effect of such a fix.
> 
> 
> If I'm reading condor.boot.rpm correctly, your patch calls ps with no arguments and searches the output for a line starting with the condor_master's pid. I see two problems. First, if ps is called with no arguments, it won't include condor_master in its results. Second, if a pid is less than 5 digits long, ps adds spaces in front of it. This would cause your patch to not notice the condor_master's pid.
> 

<offtopic ax grinding>

Honestly, the whole approach (of unix pid files) stinks.

How do you know that it is the correct condor_master, not just some other process with the same PID?  How do you know it's the condor_master started with the init script, not just some other process with the same PID also called "condor_master"?

Condor already dabbles with POSIX file locks internally - they would help immensely here.

</offtopic ax grinding>

Sorry, it's a personal pet peeve - been hit by various lock files bugs and security issues too many times in the last few years.  I've been converting all my personal projects to POSIX file locks.

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature