Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] rooster on linux, take 2
- Date: Tue, 22 Nov 2011 10:10:48 -0600
- From: Dan Bradley <dan@xxxxxxxxxxxx>
- Subject: Re: [Condor-users] rooster on linux, take 2
On 11/21/11 6:05 PM, Dimitri Maziuk wrote:
I guess my original question remains unanswered: how do you tell rooster
to wake up a node?
I am getting
11/21/11 16:49:59 WARNING: Someone at 144.92.167.254 is trying to modify
"UNHIBERNAME"
11/21/11 16:49:59 WARNING: Potential security problem, request refused
from condor_config_val despite the
ALLOW_ADMINISTRATOR = $(ALLOW_WRITE)
ALLOW_WRITE = *.bmrb.wisc.edu
Setting configuration settings remotely requires additional
configuration settings to allow it. Example:
ENABLE_RUNTIME_CONFIG = true
SETTABLE_ATTRS_ADMINISTRATOR = Unhibernate
The above example allows changing configuration settings in a running
daemon's memory (i.e. not saved permanently to disk). I believe you
have to run condor_reconfig after making the change to make it take effect.
Presumably, you are doing this before hibernating the machine? Because
after the machine goes into hibernation, you obviously can't modify its
configuration.
Submitting 10K jobs is not a usable debug technique. Besides, it worked
once, but now RoosterLog again claims
"Got 0 startd ads matching ROOSTER_UNHIBERNATE=Offline&& Unhibernate"
in spite of the full queue
Done Pre Queued Post Ready Un-Ready Failed
=== === === === === === ===
230 0 41 4 8187 0 0
and
954838.000: Run analysis summary. Of 44 machines,
...
4 match but are currently offline
The default Unhibernate expression should be true if
MachineLastMatchTime is set. MachineLastMatchTime should get set when
the negotiator matches a job to an offline machine.
To see if the negotiator is matching jobs to offline machines, add
D_FULLDEBUG to NEGOTIATOR_DEBUG. You should then see the following
message in NegotiatorLog:
"Registering attempt to match offline machine <host.name> by <user.name>."
This should result in MachineLastMatchTime getting set in the offline
machine ad. You should be able to look at the offline machine ad with a
command such as this:
condor_status -l <host.name>
All rooster does is periodically query the collector to find machines
for which Unhibernate is true. It then uses condor_power to wake them up.
I hope that helps.
--Dan
p.s. The problem with condor_power has been fixed for 7.6.5. We can
give you a pre-release if you want one.