Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] "is not an integer" (in config file)
- Date: Thu, 10 Apr 2008 10:44:19 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [Condor-users] "is not an integer" (in config file)
Finch, Ralph wrote:
condor 7.0.1 on all machines in a Wintel pool.
I'm getting different behavior on what should be identical machines.
In each machine's condor_config.local file I added the following line:
TOUCH_LOG_INTERVAL = 3600 * 24
I generally like to use a product, rather than the result,
to make it clearer (in this case, the touch log interval is a day long).
Makes sense, but unfortunately not allowed in this specific case.
Expressions like the above are allowable in ClassAd expressions, and
thus are allowed in condor_config parameters that are specifying ClassAd
expressions (like Start, Suspend, Rank, etc), but are typically not
allowed elsewhere. Someday we hope to make this better / more consistent.
After adding the line I copied the file to each machine in the pool
and issued condor_reconfig -all
Most machines accepted the change without problem: (masterlog)
4/10 08:09:31 Reconfiguring all running daemons.
4/10 08:09:31 Sent signal 1 to STARTD (pid 7424)
4/10 08:09:31 Sent signal 1 to SCHEDD (pid 904)
4/10 08:09:31 Return from HandleReq <handle_reconfig()>
4/10 08:09:31 Return from Handler <DaemonCore::HandleReqSocketHandler>
4/10 08:09:32 Calling HandleReq <HandleChildAliveCommand> (0)
4/10 08:09:32 Return from HandleReq <HandleChildAliveCommand>
4/10 08:09:32 Calling HandleReq <HandleChildAliveCommand> (0)
4/10 08:09:32 Return from HandleReq <HandleChildAliveCommand>
But some machines did not like the new line and died: (masterlog)
4/10 08:03:05 Reconfiguring all running daemons.
4/10 08:03:05 Sent signal 1 to STARTD (pid 13404)
4/10 08:03:05 Sent signal 1 to SCHEDD (pid 18172)
4/10 08:03:05 Return from HandleReq <handle_reconfig()>
4/10 08:03:05 Return from Handler <DaemonCore::HandleReqSocketHandler>
4/10 08:03:06 Calling HandleReq <HandleChildAliveCommand> (0)
4/10 08:03:06 Return from HandleReq <HandleChildAliveCommand>
4/10 08:03:06 Calling HandleReq <HandleChildAliveCommand> (0)
4/10 08:03:06 Return from HandleReq <HandleChildAliveCommand>
4/10 08:06:52 ERROR "TOUCH_LOG_INTERVAL in the condor configuration is
not an integer (3600 * 24). Please set it to an integer in the range
-2147483648 to 2147483647 (default 60)." at line 1331 in file
..\src\condor_c++_util\condor_config.C
4/10 08:06:52 Sent SIGKILL to STARTD (pid 13404) and all it's children.
4/10 08:06:53 Sent SIGKILL to SCHEDD (pid 18172) and all it's children.
4/10 08:06:53 **** Condor (condor_MASTER) EXITING WITH STATUS 1
Any ideas why the different behavior?
Maybe in the machines were it appeared to have succeeded have simply not
(yet) attempted to fetch the value of TOUCH_LOG_INTERVAL ? It is
fetched on demand at run time.
Another idea: perhaps some machines in your pool are running an older
version of Condor that doesn't look at TOUCH_LOG_INTERVAL ?
regards,
Todd
--
Todd Tannenbaum University of Wisconsin-Madison
Condor Project Research Department of Computer Sciences
tannenba@xxxxxxxxxxx 1210 W. Dayton St. Rm #4257