Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] "is not an integer" (in config file)
- Date: Thu, 10 Apr 2008 20:36:15 +0100
- From: "Kewley, J \(John\)" <j.kewley@xxxxxxxx>
- Subject: Re: [Condor-users] "is not an integer" (in config file)
You can always write a test job that goes to each machine and does a
condor_version.
Alternatively, you could do what my .cgi scripts
( http://tardis.dl.ac.uk/Condor/cgi-bin/CondorVersion.cgi for example )
use:
condor_status -master -f "%-32s" Machine -f "%s\n" CondorVersion
that will tell you the version of condor for all machines in the pool
(I seem to remember that using the -master flag means that anything
running
a condor_master daemon will get an entry, missing out will only get the
execute machines.
If you are interested in using the .cgi scripts as a starter for your
condor pool
web site, let me know.
Cheers
JK
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Finch, Ralph
> Sent: Thursday, April 10, 2008 7:23 PM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] "is not an integer" (in config file)
>
> I'm 99.9% sure that all machines are using 7.0.1. On the
> problem machines I looked backward in the MasterLog file to
> see the version number when they started, all were 7.0.1
>
> There's been odd behavior of the pool since everything was
> upgraded from 6.8.X last week. The main problem is that our
> hyperthreaded machines SMP still appear as a total of 4
> slots, even though COUNT_HYPERTHREAD_CPUS = FALSE in
> condor_config.local:
>
> slot1@xxxxxxxxxxxx WINNT51 INTEL Owner Idle 0.780 767
> 0+01:14:49
> slot2@xxxxxxxxxxxx WINNT51 INTEL Claimed Busy 0.990 767
> 0+01:44:18
> slot3@xxxxxxxxxxxx WINNT51 INTEL Unclaimed Idle 0.000 767
> 0+00:02:03
> slot4@xxxxxxxxxxxx WINNT51 INTEL Unclaimed Idle 0.000 767
> 0+00:02:04
>
> (VENICE is a two-cpu [not dual-core], hypertheaded Wintel machine).
>
> Ralph Finch
> 916-653-7552
>
>
> -----Original Message-----
> From: condor-users-bounces@xxxxxxxxxxx
> [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of Todd Tannenbaum
> Sent: Thursday, April 10, 2008 8:44 AM
> To: Condor-Users Mail List
> Subject: Re: [Condor-users] "is not an integer" (in config file)
>
> Finch, Ralph wrote:
> > condor 7.0.1 on all machines in a Wintel pool.
> >
> > I'm getting different behavior on what should be identical machines.
> >
> > In each machine's condor_config.local file I added the
> following line:
> >
> > TOUCH_LOG_INTERVAL = 3600 * 24
> >
> > I generally like to use a product, rather than the result,
> to make it
> > clearer (in this case, the touch log interval is a day long).
> >
>
> Makes sense, but unfortunately not allowed in this specific case.
> Expressions like the above are allowable in ClassAd
> expressions, and thus are allowed in condor_config parameters
> that are specifying ClassAd expressions (like Start, Suspend,
> Rank, etc), but are typically not allowed elsewhere. Someday
> we hope to make this better / more consistent.
>
> > After adding the line I copied the file to each machine in the pool
> > and issued condor_reconfig -all
> >
> > Most machines accepted the change without problem: (masterlog)
> >
> > 4/10 08:09:31 Reconfiguring all running daemons.
> > 4/10 08:09:31 Sent signal 1 to STARTD (pid 7424) 4/10 08:09:31 Sent
> > signal 1 to SCHEDD (pid 904) 4/10 08:09:31 Return from HandleReq
> > <handle_reconfig()> 4/10 08:09:31 Return from Handler
> > <DaemonCore::HandleReqSocketHandler>
> > 4/10 08:09:32 Calling HandleReq <HandleChildAliveCommand> (0) 4/10
> > 08:09:32 Return from HandleReq <HandleChildAliveCommand>
> 4/10 08:09:32
>
> > Calling HandleReq <HandleChildAliveCommand> (0) 4/10
> 08:09:32 Return
> > from HandleReq <HandleChildAliveCommand>
> >
> > But some machines did not like the new line and died: (masterlog)
> >
> > 4/10 08:03:05 Reconfiguring all running daemons.
> > 4/10 08:03:05 Sent signal 1 to STARTD (pid 13404) 4/10
> 08:03:05 Sent
> > signal 1 to SCHEDD (pid 18172) 4/10 08:03:05 Return from HandleReq
> > <handle_reconfig()> 4/10 08:03:05 Return from Handler
> > <DaemonCore::HandleReqSocketHandler>
> > 4/10 08:03:06 Calling HandleReq <HandleChildAliveCommand> (0) 4/10
> > 08:03:06 Return from HandleReq <HandleChildAliveCommand>
> 4/10 08:03:06
>
> > Calling HandleReq <HandleChildAliveCommand> (0) 4/10
> 08:03:06 Return
> > from HandleReq <HandleChildAliveCommand> 4/10 08:06:52 ERROR
> > "TOUCH_LOG_INTERVAL in the condor configuration is not an integer
> > (3600 * 24). Please set it to an integer in the range
> > -2147483648 to 2147483647 (default 60)." at line 1331 in file
> > ..\src\condor_c++_util\condor_config.C
> > 4/10 08:06:52 Sent SIGKILL to STARTD (pid 13404) and all it's
> children.
> > 4/10 08:06:53 Sent SIGKILL to SCHEDD (pid 18172) and all it's
> children.
> > 4/10 08:06:53 **** Condor (condor_MASTER) EXITING WITH STATUS 1
> >
> >
> > Any ideas why the different behavior?
> >
>
> Maybe in the machines were it appeared to have succeeded have
> simply not
> (yet) attempted to fetch the value of TOUCH_LOG_INTERVAL ?
> It is fetched on demand at run time.
>
> Another idea: perhaps some machines in your pool are running
> an older version of Condor that doesn't look at TOUCH_LOG_INTERVAL ?
>
> regards,
> Todd
>
> --
> Todd Tannenbaum University of Wisconsin-Madison
> Condor Project Research Department of Computer Sciences
> tannenba@xxxxxxxxxxx 1210 W. Dayton St. Rm #4257
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>
>
> _______________________________________________
> Condor-users mailing list
> To unsubscribe, send a message to
> condor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/condor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/condor-users/
>