On 11/28/2011 09:18 AM, Dan Bradley wrote: > No. The rooster daemon is currently not configurable via > condor_config_val. You will need to modify the configuration file and > run condor_reconfig. That ought to be in the manual: the error message if you try to use condor_config_val is not helpful to put it mildly (WARNING: Potential security problem, request refused). > I'm skeptical about the truth of the statement in the manual. In a > quick glance through the code, I don't see any suppression of > hibernation for an hour after it wakes up. I could have overlooked it, > but I've made a note to verify the behavior. Well, what I see here is sleeping machine isn't getting matched by the negotiator for some reason. If I wake it up manually it runs jobs for 5 minutes (HIBERNATE_CHECK_INTERVAL = 300) and then shuts down again. Its sleep state is S4 (as far as condor is concerned, it looks like a full shutdown to me), that 1 hour period should apply and indeed does not seem to. Which probably wouldn't be a problem if the negotiator kept the machine busy, but that isn't happening. So far I found only one way to match that machine to a job (and have rooster wake it up): specifically request TARGET.Machine in job submit file. So the next question is how do I figure out what's up with the negotiator? (E.g.) with 40 cores busy and 4 cores sleeping condor_q -analyze 961082 says: -- Submitter: minnow.bmrb.wisc.edu : <144.92.167.254:9617?sock=13250_c2fa_3> : minnow.bmrb.wisc.edu --- 961082.000: Run analysis summary. Of 44 machines, ... 4 match but are currently offline 0 are available to run your job No successful match recorded. Last failed match: Fri Nov 25 18:18:55 2011 Reason for last match failure: no match found ----------------------------------------------------- NegotiatorLog (on D_FULLDEBUG) is not very informative as to why the "4 matching but offline" cores are not a "successful match": 11/25/11 18:17:55 Sending SEND_JOB_INFO/eom 11/25/11 18:17:55 Getting reply from schedd ... 11/25/11 18:17:55 Got JOB_INFO command; getting classad/eom 11/25/11 18:17:55 Request 961082.00000: 11/25/11 18:17:55 matchmakingAlgorithm: limit 4.000000 used 0.000000 pieLeft 4.000000 11/25/11 18:17:55 Rejected 961082.0 bbee@xxxxxxxxxxxxx <144.92.167.254:9617?sock=13250_c2fa_3>: no match found -------------------------------------------------------- Thanks -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature