Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] Having a problem with 7.4.1 on windows - solved?
- Date: Thu, 14 Jan 2010 16:43:14 +0800
- From: <Greg.Hitchen@xxxxxxxx>
- Subject: Re: [Condor-users] Having a problem with 7.4.1 on windows - solved?
Answering my own question (would any condor developers
care to comment?).
After much troubleshooting it became obvious that the
problem had something to do with the startd,
i.e. a config file with only the master, or only master
and schedd, would work fine but anytime startd
was included in the daemon list then the same exception
error would occur. I had noticed that a keyboard
daemon kbdd was included in the msi generated config
file so added that as well (without startd) and
condor started and did not crash. The kbddLog file had
a message that it was aborting because it
couldn't detect the startd running, ah ha. So I
included startd now and all seems OK.
It seems as though condor_kbdd.exe needs to be started
in the daemon list as a "helper" for startd
now in the windows 7.4.* series, whereas it didn't
exist in the 7.2.* series (although there was a
condor_kbdd_dll.dll file). I've had a quick look but
can't seem to find any reference to this new
"requirement" on the condor web site in the release
notes (or anywhere else). It also seems a bit
extreme that not having it in the daemon list along
with the startd causes condor to crash with an exception
error rather than log a message to the log file and
abort.
Hopefully this may helps others if they encounter this
problem, assuming I've got it right! :) Condor Team?
Cheers
Greg
Bit more info.
Installing 7.4.1 manually on the local PC with the MSI
file works OK.
i.e. Condor up and running and joined pool
fine.
Stop condor, replace with our working 7.2.4 config
file, start condor and back
to behaviour described previously. For reference here
is the MasterLog when
using the MSI generated config file that works
OK.
01/13 12:25:08 UnsetEnv(NET_REMAP_ENABLE):
SetEnvironmentVariable failed, errno=203
01/13 12:25:08 Locale:
English_United States.1252
01/13 12:25:08
******************************************************
01/13 12:25:08 **
Condor (CONDOR_MASTER) STARTING UP
01/13 12:25:08 ** C:\Program
Files\condor\bin\condor_master.exe
01/13 12:25:08 ** SubsystemInfo:
name=MASTER type=MASTER(2) class=DAEMON(1)
01/13 12:25:08 ** Configuration:
subsystem:MASTER local:<NONE> class:DAEMON
01/13 12:25:08 **
$CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
01/13 12:25:08 **
$CondorPlatform: INTEL-WINNT50 $
01/13 12:25:08 ** PID = 3676
01/13
12:25:08 ** Log last touched time unavailable (No such file or
directory)
01/13 12:25:08
******************************************************
01/13 12:25:08 Using
config source: C:\Program Files\condor\condor_config
01/13 12:25:08 Using
local config sources:
01/13 12:25:08
C:\PROGRA~1\condor/condor_config.local
01/13 12:25:08 DaemonCore: Command
Socket at <130.116.144.59:1199>
01/13 12:25:08 Authorized application
C:\PROGRA~1\condor/bin/condor_schedd.exe is now enabled in the
firewall.
01/13 12:25:08 Authorized application
C:\PROGRA~1\condor/bin/condor_shadow.exe is now enabled in the
firewall.
01/13 12:25:08 Authorized application
C:\PROGRA~1\condor/bin/condor_gridmanager.exe is now enabled in the
firewall.
01/13 12:25:08 Authorized application
C:\PROGRA~1\condor/bin/condor_c-gahp.exe is now enabled in the
firewall.
01/13 12:25:08 Authorized application
C:\PROGRA~1\condor/bin/condor_c-gahp_worker_thread.exe is now enabled in the
firewall.
01/13 12:25:08 Authorized application
C:\PROGRA~1\condor/bin/condor_startd.exe is now enabled in the
firewall.
01/13 12:25:08 Authorized application
C:\PROGRA~1\condor/bin/condor_kbdd.exe is now enabled in the firewall.
01/13
12:25:08 Authorized application C:\PROGRA~1\condor/bin/condor_starter.exe is now
enabled in the firewall.
01/13 12:25:08 Authorized application
C:\PROGRA~1\condor/bin/condor_vm-gahp.exe is now enabled in the
firewall.
01/13 12:25:08 Authorized application
C:\PROGRA~1\condor/bin\condor_dagman.exe is now enabled in the
firewall.
01/13 12:25:08 Started DaemonCore process
"C:\PROGRA~1\condor/bin/condor_schedd.exe", pid and pgroup = 4016
01/13
12:25:09 Started DaemonCore process "C:\PROGRA~1\condor/bin/condor_startd.exe",
pid and pgroup = 3808
01/13 12:25:09 Started DaemonCore process
"C:\PROGRA~1\condor/bin/condor_kbdd.exe", pid and pgroup =
2828
How can a different config file (that has worked for
7.2.4 and many previous versions)
cause access violations, exceptions and core file
dumps?
Cheers
Greg
Hi
All
Just testing
upgrading from 7.2.4 to 7.4.1 on a few Windows machines
before applying to
our pool/s. We have just downloaded the zip
file,
unzipped it and
copied to the PCs, along with the appropriate config
files
(and after net stop
condor).
(Our normal
distribution is via a file server with the latest version and
a
scheduled task on
the PCs that checks every day and downloads if there
is a different
version on the server to that on the local PC).
For testing we do
this manually, we then net start condor and.......
In each case we get
the following MasterLog file and CORE.Master.Win32 file
Any
suggestions/ideas?
Thanks
Cheers
Greg
01/12 14:57:05 UnsetEnv(NET_REMAP_ENABLE):
SetEnvironmentVariable failed, errno=203
01/12 14:57:05 Locale:
English_United States.1252
01/12 14:57:05
******************************************************
01/12 14:57:05 **
Condor (CONDOR_MASTER) STARTING UP
01/12 14:57:05 **
c:\PROGRA~1\condor\bin\condor_master.exe
01/12 14:57:05 ** SubsystemInfo:
name=MASTER type=MASTER(2) class=DAEMON(1)
01/12 14:57:05 ** Configuration:
subsystem:MASTER local:<NONE> class:DAEMON
01/12 14:57:05 **
$CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
01/12 14:57:05 **
$CondorPlatform: INTEL-WINNT50 $
01/12 14:57:05 ** PID = 7880
01/12
14:57:05 ** Log last touched time unavailable (No such file or
directory)
01/12 14:57:05
******************************************************
01/12 14:57:05 Using
config source: c:\PROGRA~1\condor\condor_config
01/12 14:57:05 Using local
config sources:
01/12 14:57:05
C:\PROGRA~1\condor/condor_config.local
01/12 14:57:05 DaemonCore: Command
Socket at <130.116.146.130:9391>
01/12 14:57:05 Authorized application
C:\PROGRA~1\condor/bin/condor_startd.exe is now enabled in the
firewall.
01/12 14:57:05 Intercepting an unhandled exception.
01/12
14:57:05 Dropping a core file.
//=====================================================
PID: 7880
Exception code: C0000005 ACCESS_VIOLATION
Fault address: 00493720 01:00092720
c:\PROGRA~1\condor\bin\condor_master.exe
Registers:
EAX:00000001
EBX:00D60700
ECX:00000000
EDX:7C90E514
ESI:00C9FE98
EDI:00000400
CS:EIP:001B:00493720
SS:ESP:0023:00C9FDF0 EBP:00C9FE1C
DS:0023 ES:0023 FS:003B GS:0000
Flags:00010246
Call stack:
Address Frame
00493720 00C9FDEC strlen
(f:\dd\vctools\crt_bld\SELF_X86\crt\src\INTEL\strlen.asm:81)
00465FE5 00C9FE1C WindowsFirewallHelper::charToBstr
(c:\condor\execute\dir_5488\userdir\src\condor_c++_util\firewall.windows.cpp:403)
00466467 00C9FE38 WindowsFirewallHelper::addTrusted
(c:\condor\execute\dir_5488\userdir\src\condor_c++_util\firewall.windows.cpp:135)
0040368C 00564940 init_firewall_exceptions
(c:\condor\execute\dir_5488\userdir\src\condor_master.v6\master.cpp:1420)
//=====================================================