Hi Bryan,
From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx] On Behalf Of carl langlois
Sent: Friday, March 28, 2008 3:51 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] slot1 resources disappear after a few days.
Do you have any core.something.WIN32 in your log directory? I got a similar problem that some slot disappear from the pool at one point in time and have notice to core file in the log directory. But don't know why it append.
Carl
On Fri, Mar 28, 2008 at 11:02 AM, Bryan S. Maher <Bryan.Maher@xxxxxxxxxx> wrote:
Hi All:
I have a new Condor pool uniformly running v7.0.1 on Windows. After a day or two the slot1 resources fail to show up when issuing a condor_status command. Here is sample output:
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxxx. WINNT51 INTEL Owner Idle 0.030 1023 0+04:32:59
slot2@xxxxxxxxxxx. WINNT51 INTEL Owner Idle 0.000 1023 0+04:33:00
slot2@xxxxxxxxxxxx WINNT51 INTEL Owner Idle 0.000 1534 0+04:35:05
slot2@xxxxxxxxxxxx WINNT52 INTEL Unclaimed Idle 0.000 1006 5+14:26:38
slot2@xxxxxxxxxxxx WINNT52 INTEL Unclaimed Idle 0.000 1006 0+02:25:07
slot2@xxxxxxxxxxxx WINNT52 INTEL Unclaimed Idle 0.000 1006 0+02:25:05
slot2@xxxxxxxxxxxx WINNT52 INTEL Unclaimed Idle 0.000 1006 0+02:25:05
slot2@xxxxxxxxxxxx WINNT52 INTEL Unclaimed Idle 0.000 1006 0+02:25:07
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/WINNT51 3 3 0 0 0 0 0
INTEL/WINNT52 5 0 0 5 0 0 0
Total 8 3 0 5 0 0 0
As you can see, even the totals fail to count the slot1 resources. A condor_reconfig is sufficient to bring slot1 back to life. The StartLog on an affected machine looks like:
3/17 12:03:18 ******************************************************
3/17 12:03:18 ** condor_startd.exe (CONDOR_STARTD) STARTING UP
3/17 12:03:18 ** C:\condor\bin\condor_startd.exe
3/17 12:03:18 ** $CondorVersion: 7.0.1 Feb 27 2008 BuildID: 76180 $
3/17 12:03:18 ** $CondorPlatform: INTEL-WINNT50 $
3/17 12:03:18 ** PID = 1880
3/17 12:03:18 ** Log last touched 3/17 11:01:32
3/17 12:03:18 ******************************************************
3/17 12:03:18 Using config source: C:\condor\condor_config
3/17 12:03:18 Using local config sources:
3/17 12:03:18 C:\condor\condor_config.local
3/17 12:03:18 DaemonCore: Command Socket at <x.x.x.x:1071>
3/17 12:03:18 MachAttributes::publish: failed to get Windows version information
3/17 12:03:24 slot1: New machine resource allocated
3/17 12:03:24 slot2: New machine resource allocated
3/17 12:03:29 About to run initial benchmarks.
3/17 12:03:33 Completed initial benchmarks.
.
. slot2 continues to run benchmarks, slot1 never runs benchmarks …
.
3/17 12:03:33 slot2: State change: IS_OWNER is false
3/17 12:03:33 slot2: Changing state: Owner -> Unclaimed
3/17 12:03:33 slot1: State change: IS_OWNER is false
3/17 12:03:33 slot1: Changing state: Owner -> Unclaimed
3/17 16:03:33 State change: RunBenchmarks is TRUE
3/17 16:03:33 slot2: Changing activity: Idle -> Benchmarking
3/17 16:03:36 State change: benchmarks completed
3/17 16:03:36 slot2: Changing activity: Benchmarking -> Idle
3/17 20:03:36 State change: RunBenchmarks is TRUE
3/17 20:03:36 slot2: Changing activity: Idle -> Benchmarking
3/17 20:03:39 State change: benchmarks completed
.
. reconfig sent, slot1 begins to run benchmarks in lieu of slot2
. slot1 is reappears in condor_status for a while …
.
3/22 21:50:06 Got SIGHUP. Re-reading config files.
3/23 00:10:06 State change: RunBenchmarks is TRUE
3/23 00:10:06 slot1: Changing activity: Idle -> Benchmarking
3/23 00:10:10 State change: benchmarks completed
3/23 00:10:10 slot1: Changing activity: Benchmarking -> Idle
3/23 04:10:10 State change: RunBenchmarks is TRUE
3/23 04:10:10 slot1: Changing activity: Idle -> Benchmarking
3/23 04:10:14 State change: benchmarks completed
3/23 04:10:14 slot1: Changing activity: Benchmarking -> Idle
.
. slot1 benchmarks continue but slot1 is no longer visible in condor_status …
.
3/28 04:12:18 slot1: Changing activity: Benchmarking -> Idle
3/28 08:12:19 State change: RunBenchmarks is TRUE
3/28 08:12:19 slot1: Changing activity: Idle -> Benchmarking
3/28 08:12:22 State change: benchmarks completed
3/28 08:12:22 slot1: Changing activity: Benchmarking -> Idle
<end>
Any ideas?
-Bryan
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/