Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] watchdog pipe file missing
- Date: Thu, 29 Jan 2009 13:21:52 -0600
- From: Greg Quinn <gquinn@xxxxxxxxxxx>
- Subject: Re: [Condor-users] watchdog pipe file missing
Hello,
Fernando Rannou wrote:
1. after executing condor_restart -startd the watchdog
files are not created. There is only the StartLog
which shows an error:
1/29 10:10:56 PERMISSION DENIED to unauthenticated user from host
<192.168.10.10:32851 <http://192.168.10.10:32851>> for command 60005
(DC_OFF_GRACEFUL), access level ADMINISTRATOR
The StartD was never actually restarted, since your condor_restart
command was denied permission. The HOSTALLOW_ADMINISTRATOR setting is
what determines the machines from which you can issue a condor_restart.
Your HOSTALLOW_ADMINSTRATOR setting is probably at its default setting,
which includes only the central manager. So you could:
1) Issue all the needed condor_restart commands from the central manager
using the form "condor_restart -startd <hostname>"
2) Loosen your HOSTALLOW_ADMINISTRATOR setting if the security
implications of doing so don't concern you. For example, setting
HOSTALLOW_ADMINISTRATOR = $(CONDOR_HOST), $(FULL_HOSTNAME)
would give someone logged into any host in your pool the ability to
send administrative commands to the Condor daemons running on that
host.
However, the node still shows on condor_status ??
Right, the StartD never exited and is still reporting itself to the
Collector.
2. when I submit my first job, I get this error on StarterLog.slot1
1/29 10:25:37 About to exec /bin/date --universal
1/29 10:25:37 error opening watchdog pipe
/home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog: No such file
or directory (2)
1/29 10:25:37 ProcFamilyClient: error initializing LocalClient
1/29 10:25:37 ProcFamilyProxy: error initializing ProcFamilyClient
1/29 10:25:37 ERROR "ProcD has failed" at line 599 in file
proc_family_proxy.C
1/29 10:25:37 ShutdownFast all jobs.
Sure - same error as before since the StartD hasn't been restarted.
Later,
Greg Quinn
Condor Team
Thanks for your patience, Greg
Fernando
On Wed, Jan 28, 2009 at 5:13 PM, Greg Quinn <gquinn@xxxxxxxxxxx
<mailto:gquinn@xxxxxxxxxxx>> wrote:
Fernando Rannou wrote:
> Thanks Greg
>
> but then, what should I do to create the file
> in the meantime?
>
> Fernando
I'm pretty sure that restarting the StartD (condor_restart -startd) on
each machine that is missing the file should do the trick.
Greg
> On Wed, Jan 28, 2009 at 4:53 PM, Greg Quinn <gquinn@xxxxxxxxxxx
<mailto:gquinn@xxxxxxxxxxx>
> <mailto:gquinn@xxxxxxxxxxx <mailto:gquinn@xxxxxxxxxxx>>> wrote:
>
> Fernando,
>
> The "watchdog" pipe is created by the ProcD when it starts
up, and is
> only ever deleted by Condor when the ProcD shuts down.
>
> Is it possible that something outside of Condor is deleting
the pipe? We
> have seen problems like this before with programs like tmpwatch
> (although I guess it's doubtful that tmpwatch is running over
your
> /home/condor/hosts/wolf10/log/ directory).
>
> Come to think of it, /home/condor/hosts/wolf10/log sounds
like it could
> be on NFS. It's perfectly fine to have your LOG directory on
NFS, but it
> is in that case required to have a separate local LOCK
directory (where
> things like the ProcD's pipes are stored). Please make sure
that your
> LOCK setting refers to a local directory.
>
> Thanks,
>
> Greg Quinn
> Condor Team
>
> Fernando Rannou wrote:
> > Hello,
> > I'm getting he following error in one of the StaterLog
> > ------------------------
> > 1/28 11:20:04 About to exec /home/mpetct/sampproc --universal
> > 1/28 11:20:04 error opening watchdog pipe
> > /home/condor/hosts/wolf10/log/procd_pipe.STARTD.watchdog:
No such
> file
> > or directory (2)
> > 1/28 11:20:04 ProcFamilyClient: error initializing LocalClient
> > 1/28 11:20:04 ProcFamilyProxy: error initializing
ProcFamilyClient
> > 1/28 11:20:04 ERROR "ProcD has failed" at line 599 in file
> > proc_family_proxy.C
> > 1/28 11:20:04 ShutdownFast all jobs.
> > --------------------------
> > Clealry the "pipe" files are not there. What should I do.
> > We restarted condor on all nodes but the files did not appear.
> >
> > This has happened in a couple of nodes. All other nodes do
have the
> > watchdog file:
> >
> > prw-rw---- 1 root isl 0 Nov 4 16:08
> procd_pipe.STARTD
> > prw-rw---- 1 root isl 0 Nov 4 16:08
> > procd_pipe.STARTD.watchdog
> > -
> > Thanks
> >
> > Fernando
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx
<mailto:condor-users-request@xxxxxxxxxxx>
> <mailto:condor-users-request@xxxxxxxxxxx
<mailto:condor-users-request@xxxxxxxxxxx>> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
>
>
------------------------------------------------------------------------
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
condor-users-request@xxxxxxxxxxx
<mailto:condor-users-request@xxxxxxxxxxx> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
<mailto:condor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
------------------------------------------------------------------------
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/