Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] startd hangs when using job hooks
- Date: Fri, 12 Feb 2010 13:27:39 -0500
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [Condor-users] startd hangs when using job hooks
On 02/09/2010 10:47 AM, Michael Moore wrote:
> On Tue, Feb 09, 2010 at 10:04:16AM -0500, Matthew Farrellee wrote:
>> Michael Moore wrote:
>>> I am trying to implement a set of fetch and prepare hooks. However, when
>>> testing the hooks I experience hangs of condor_startd. When startd hangs
>>> it quits responding to requests and condor shutdowns. Only a process
>>> level kill ends the process.
>>>
>>> The host running the hooks is a Windows Vista host running Condor 7.4.1.
>>> The prepare hook does take some time to run (on the order of minutes).
>>> However, startd does not always hang during the prepare hook. Sometimes
>>> startd hangs after the job begins executing, sometimes it doesn't hang
>>> at all.
>>>
>>> Has anyone else seen similar behavior? Was there a way to work around
>>> the problem? Apparently, there was a similar problem in 7.3.2 and prior
>>> where a very simple fetch hook would cause startd to hang. I haven't
>>> figured out what portion of the hook triggers this behavior, it's very
>>> intermittent.
>>>
>>> Thanks,
>>> Michael Moore
>>
>> A few issues with hooks on Windows...
>>
>> http://condor-wiki.cs.wisc.edu/index.cgi/search?s=hook+windows
>>
>> Specifically...
>>
>> http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=422
>> http://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=864
>>
>> Do either of those sound like your problem?
>>
>> I believe one of those is related to using Windows on a machine with
>> many CPUs -- or at least it is more reproducible there.
>>
>> Best,
>>
>>
>> matt
>
> Matt,
>
> Ticket 422 is the previous issue I mentioned above. I did test to make
> sure I wasn't seeing that issue but it seems to be correctly resolved in
> my testing. The second issue may exist but I don't get that far.
> startd will hang before the job completes. ITicket 864 is not the
> issue I'm seeing. A good way to describe it is the same symptoms of
> ticket 422 but the issue is not as reproducible and not caused by the
> simple case provided in that ticket.
>
> I can confirm I see the issue when I force the number of slots to 1. I
> don't know about the level of reproducibility.
>
> Thanks for the help!
>
> Michael
If you can get the issue to reproduce let us know and we can get a new ticket filed.
Best,
matt