Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] GAHP error
Hi,
I'm utilizing Condor at an OSG site, and I can successfully submit and
run jobs to the site from another, but when I try and submit a job
from the broken site, jobs just end up on hold. He's the Gridmanager
log, which shows an error starting the GAHP server. I've verified I
can manually run $CONDOR_LOCATION/sbin/gt4_gahp and gahp_server
without errors.
Any thoughts on what I should try next? How can I find out what file
it means when it says "Failed to initialize from file" I tried
strace, but I'm not that very good with it, so maybe the answer lies
in there.
Things haven't been working right for a little while now. I'm not
100% sure, because I'm not directly responsible for this system, but I
believe the last change that was made was a third NIC was installed.
The IP address mentioned in the log file is bound to the interface
that is on the private LAN with the rest of the cluster nodes.
I don't really think this is what the problem is, but if I really
knew, I wouldn't be asking the mailing list. :D
--Peter
6/17 12:28:15 ******************************************************
6/17 12:28:15 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
6/17 12:28:15 ** /osg/programs/condor/sbin/condor_gridmanager
6/17 12:28:15 ** SubsystemInfo: name=GRIDMANAGER type=DAEMON(10)
class=DAEMON(1)
6/17 12:28:15 ** Configuration: subsystem:GRIDMANAGER local:<NONE>
class:DAEMON
6/17 12:28:15 ** $CondorVersion: 7.2.2 Apr 9 2009 BuildID: 145189 $
6/17 12:28:15 ** $CondorPlatform: X86_64-LINUX_RHEL5 $
6/17 12:28:15 ** PID = 3661
6/17 12:28:15 ** Log last touched 6/17 12:22:38
6/17 12:28:15 ******************************************************
6/17 12:28:15 Using config source: /osg/programs/condor/etc/
condor_config
6/17 12:28:15 Using local config sources:
6/17 12:28:15 /scratch/condor/condor_config.local
6/17 12:28:15 DaemonCore: Command Socket at <10.0.128.2:10406>
6/17 12:28:18 [3661] Found job 20329.0 --- inserting
6/17 12:28:18 [3661] Found job 20329.1 --- inserting
6/17 12:28:18 [3661] Found job 20329.2 --- inserting
6/17 12:28:18 [3661] Found job 20329.3 --- inserting
6/17 12:28:18 [3661] gahp server not up yet, delaying ping
6/17 12:28:18 [3661] GAHP server not initialized yet, not submitting
grid_monitor now
6/17 12:28:18 [3661] (20329.0) doEvaluateState called: gmState
GM_INIT, globusState 32
6/17 12:28:18 [3661] GAHP server pid = 3672
6/17 12:28:18 [3661] GAHP command 'INITIALIZE_FROM_FILE' failed: 7
6/17 12:28:18 [3661] GAHP: Failed to initialize from file
6/17 12:28:18 [3661] (20329.0) Error initializing GAHP
6/17 12:28:18 [3661] (20329.1) doEvaluateState called: gmState
GM_INIT, globusState 32
6/17 12:28:18 [3661] GAHP command 'INITIALIZE_FROM_FILE' failed: 7
6/17 12:28:18 [3661] GAHP: Failed to initialize from file
6/17 12:28:18 [3661] (20329.1) Error initializing GAHP
6/17 12:28:18 [3661] (20329.2) doEvaluateState called: gmState
GM_INIT, globusState 32
6/17 12:28:18 [3661] GAHP command 'INITIALIZE_FROM_FILE' failed: 7
6/17 12:28:18 [3661] GAHP: Failed to initialize from file
6/17 12:28:18 [3661] (20329.2) Error initializing GAHP
6/17 12:28:18 [3661] (20329.3) doEvaluateState called: gmState
GM_INIT, globusState 32
6/17 12:28:18 [3661] GAHP command 'INITIALIZE_FROM_FILE' failed: 7
6/17 12:28:18 [3661] GAHP: Failed to initialize from file
6/17 12:28:18 [3661] (20329.3) Error initializing GAHP
6/17 12:28:23 [3661] gahp server not up yet, delaying ping
6/17 12:28:23 [3661] GAHP server not initialized yet, not submitting
grid_monitor now
6/17 12:28:23 [3661] No jobs left, shutting down
6/17 12:28:23 [3661] Got SIGTERM. Performing graceful shutdown.