Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] How to find starter process which got struck and not responding to Startd
- Date: Mon, 26 Jul 2010 09:51:57 -0500
- From: "Timothy St. Clair" <tstclair@xxxxxxxxxx>
- Subject: Re: [Condor-users] How to find starter process which got struck and not responding to Startd
I have a couple of questions regarding your questions ;-)
1.) What version of condor are you running?
2.) What type of vm job are you running? (vmware, lvm, or xen?)
Cheers,
Tim
On Mon, 2010-07-26 at 19:54 +0530, Johnson koil Raj wrote:
> Hi.
>
> In our pool we are facing this issue intermittently. we are
> running VM Jobs. And this always happen when the Starter process
> trying to get the status of a VM.
>
> The Starter process will struck or hang without any log update
> futher and no updates will be sent to STARTD, so it keeps last updated
> status. After some time the corresponding job in queue will goes to
> idle state. And trying to match another machine to execute.
>
> The VM job is inconsistent state for some time if it was actually
> powered off by the user from inside. The VM job state is running.
> 1. Is there any way to find those kind of STARTER process which
> is not updating the STARTD.
> 2. I am polling VM status for every 2 minutes, how can I
> configure STARTD so that it show kill the STARTER process if
> it not responding will proper data after max 5 minutes.
> 3. How to force the job to match the same machine in that case
> when job went into idle state and try to match new machine.
>
> Thanks,
> Johnson.
>
>
>
>
> Please do not print this email unless it is absolutely necessary.
>
> The information contained in this electronic message and any
> attachments to this message are intended for the exclusive use of the
> addressee(s) and may contain proprietary, confidential or privileged
> information. If you are not the intended recipient, you should not
> disseminate, distribute or copy this e-mail. Please notify the sender
> immediately and destroy all copies of this message and any
> attachments.
>
> WARNING: Computer viruses can be transmitted via email. The recipient
> should check this email and any attachments for the presence of
> viruses. The company accepts no liability for any damage caused by any
> virus transmitted by this email.
>
> www.wipro.com
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/