Hi Brian, On 06/12/2012 11:51 AM, Brian Candler wrote:
Should that go in condor_config on all machines, or just condor_config.local on the master? I'm guessing ALLOW_WRITE needs to list all the exec nodes plus all the submit nodes?
Correct, the central manager needs to at least accept writes to the collector from all nodes, and writes to the schedd from submit nodes, for things to start working. In practice, setting ALLOW_WRITE = * is the easiest to get started but a security risk. Condor used to ship with default ALLOW_WRITE configuration that prevented it from starting at all unless you read the comments and set it to something sensible.
Anyway, by guesswork I tried adding "ALLOW_WRITE = *" to dev-storage2 and that seems to have fixed that problem (or maybe it was just the restart). Do execute nodes need ALLOW_WRITE from the manager? Perhaps the default should be: ALLOW_WRITE = $(FULL_HOSTNAME), $(IP_ADDRESS), $(CONDOR_HOST) ?
This is sufficient if you only submit jobs from CONDOR_HOST. If you have additional schedds that aren't condor_hosts you'll need to allow them too.
Finally, I note that one core was in state "Owner" and did not run any jobs. I read through http://research.cs.wisc.edu/condor/manual/v7.8/3_12Setting_Up.html#SECTION004128000000000000000 and (I think) fixed this using: SLOTS_CONNECTED_TO_CONSOLE = 0 SLOTS_CONNECTED_TO_KEYBOARD = 0 on both nodes. (These are intended to be headless nodes)
If you want jobs to always run on them regardless of user activity or cpu load, you could also just set START = TRUE in condor_config.local. That way those cores will never be in the "Owner" state.
Rob