Hi All: I have a new Condor pool uniformly running v7.0.1 on
Windows. After a day or two the slot1 resources fail to show up when
issuing a condor_status command. Here is sample output: Name
OpSys Arch
State Activity LoadAv Mem ActvtyTime slot1@xxxxxxxxxxxx
WINNT51 INTEL Owner
Idle 0.030 1023 0+04:32:59 slot2@xxxxxxxxxxxx
WINNT51 INTEL Owner
Idle 0.000 1023 0+04:33:00 slot2@xxxxxxxxxxxx
WINNT51 INTEL Owner
Idle 0.000 1534 0+04:35:05 slot2@xxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 1006 5+14:26:38 slot2@xxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 1006 0+02:25:07 slot2@xxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 1006 0+02:25:05 slot2@xxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 1006 0+02:25:05 slot2@xxxxxxxxxxxx
WINNT52 INTEL Unclaimed Idle
0.000 1006 0+02:25:07
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/WINNT51 3
3
0
0 0
0 0
INTEL/WINNT52 5
0
0
5 0 0
0
Total 8
3
0 5
0
0 0 As you can see, even the totals fail to count the slot1
resources. A condor_reconfig is sufficient to bring slot1 back to life.
The StartLog on an affected machine looks like: 3/17
12:03:18 ****************************************************** 3/17
12:03:18 ** condor_startd.exe (CONDOR_STARTD) STARTING UP 3/17
12:03:18 ** C:\condor\bin\condor_startd.exe 3/17
12:03:18 ** $CondorVersion: 7.0.1 Feb 27 2008 BuildID: 76180 $ 3/17
12:03:18 ** $CondorPlatform: INTEL-WINNT50 $ 3/17
12:03:18 ** PID = 1880 3/17
12:03:18 ** Log last touched 3/17 11:01:32 3/17
12:03:18 ****************************************************** 3/17
12:03:18 Using config source: C:\condor\condor_config 3/17
12:03:18 Using local config sources: 3/17
12:03:18 C:\condor\condor_config.local 3/17
12:03:18 DaemonCore: Command Socket at <x.x.x.x:1071> 3/17
12:03:18 MachAttributes::publish: failed to get Windows version information 3/17
12:03:24 slot1: New machine resource allocated 3/17
12:03:24 slot2: New machine resource allocated 3/17
12:03:29 About to run initial benchmarks. 3/17
12:03:33 Completed initial benchmarks. . . slot2 continues to run benchmarks, slot1 never runs
benchmarks … . 3/17
12:03:33 slot2: State change: IS_OWNER is false 3/17
12:03:33 slot2: Changing state: Owner -> Unclaimed 3/17
12:03:33 slot1: State change: IS_OWNER is false 3/17
12:03:33 slot1: Changing state: Owner -> Unclaimed 3/17
16:03:33 State change: RunBenchmarks is TRUE 3/17
16:03:33 slot2: Changing activity: Idle -> Benchmarking 3/17
16:03:36 State change: benchmarks completed 3/17
16:03:36 slot2: Changing activity: Benchmarking -> Idle 3/17
20:03:36 State change: RunBenchmarks is TRUE 3/17
20:03:36 slot2: Changing activity: Idle -> Benchmarking 3/17
20:03:39 State change: benchmarks completed . . reconfig sent, slot1 begins to run benchmarks in lieu of
slot2 . slot1 is reappears in condor_status for a while … . 3/22
21:50:06 Got SIGHUP. Re-reading config files. 3/23
00:10:06 State change: RunBenchmarks is TRUE 3/23
00:10:06 slot1: Changing activity: Idle -> Benchmarking 3/23
00:10:10 State change: benchmarks completed 3/23
00:10:10 slot1: Changing activity: Benchmarking -> Idle 3/23
04:10:10 State change: RunBenchmarks is TRUE 3/23
04:10:10 slot1: Changing activity: Idle -> Benchmarking 3/23
04:10:14 State change: benchmarks completed 3/23
04:10:14 slot1: Changing activity: Benchmarking -> Idle . . slot1 benchmarks continue but slot1 is no longer visible
in condor_status … . 3/28
04:12:18 slot1: Changing activity: Benchmarking -> Idle 3/28
08:12:19 State change: RunBenchmarks is TRUE 3/28
08:12:19 slot1: Changing activity: Idle -> Benchmarking 3/28
08:12:22 State change: benchmarks completed 3/28
08:12:22 slot1: Changing activity: Benchmarking -> Idle <end> Any ideas? -Bryan |