Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Need help running Condor with Globus Toolkit
- Date: Sat, 3 Sep 2005 00:49:33 -0800
- From: abhishek tripathi <tripathi.abhishek@xxxxxxxxx>
- Subject: [Condor-users] Need help running Condor with Globus Toolkit
Hi,
I am vey to new to condor world. I have installed the condor 6.6.9 and
globus tool kit 4.0 on my system and creted all the necesary certificates.so
when i run my job using globus-job-run command then the job executed well
but when i am trying to run the using condor_submit then it displays the
message that job is submited but when i check the status using condor_q it
shows me that job is ideal.whne I check with condor_globus then it shows me
the job status unsubmited and in the grid manager log file it shows me that
detected globus resource is down...
I dont what it means please also give some suggestion how we setup machines
in a condor-G pool or a simple grid .
Here're the sequence of steps I took
Step 1: Start database -
/etc/init.d/postgresql start
Step 2:
globus>globus-start-container
Step 3:
condor>condor_master
Step 4:
condor>grid-proxy-init
condor>globus-personal-gatekeeper -start
condor> condor_submit
/usr/local/condor/testjobs/globusjob.submit
Step 5:
condor_q
condor_q -globus
The response I get to "condor_q" is
-- Submitter: pc-p31972.somedomain.com : <192.168.2.140:33105> :
pc-p31972.somedomain.com
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
1.0 condor 9/2 16:11 0+00:00:00 I 0 0.0 date
However, I'm not sure what to do next. If I run the command "condor_q
globu" (or any similar command of the form "condor_q globusANYCHARACTERS
(where ANYCHARACTERS are any random characters)" I get a response of the
form
-- Submitter: pc-p31972.somedomain.com : <192.168.2.140:33105> :
pc-p31972.somedomain.com
ID OWNER STATUS MANAGER HOST EXECUTABLE
1.0 condor UNSUBMITTED fork pc-p31972.somedomain.co /bin/date
**********************************************************
Kindly advise how to SUBMIT the above jobs over Globus
**********************************************************
By the way, the log file shows the following -
9/2 20:19:19 [6685] Resources down for more than 900 secs -- killing
GAHP
9/2 20:19:19 [6685] GAHP command 'RESULTS' failed
9/2 20:19:19 [6685] ERROR "Gahp Server (pid=6686) died due to signal 9
" at line 359 in file gahp-client.C
9/2 20:19:19 [6843] Resources down for 658 seconds!
9/2 20:19:35 [7274] Resources down for 238 seconds!
9/2 20:20:19 [6843] Resources down for 718 seconds!
9/2 20:20:34 ******************************************************
9/2 20:20:34 ** condor_gridmanager (CONDOR_GRIDMANAGER) STARTING UP
9/2 20:20:34 ** /usr/local/condor/sbin/condor_gridmanager
9/2 20:20:34 ** $CondorVersion: 6.6.10 Jun 13 2005 $
9/2 20:20:34 ** $CondorPlatform: I386-LINUX_RH80 $
9/2 20:20:34 ** PID = 7633
9/2 20:20:34 ******************************************************
9/2 20:20:34 Using config file: /home/condor/condor_config
9/2 20:20:34 Using local config files:
/usr/local/condor/var/condor_config.local
9/2 20:20:34 DaemonCore: Command Socket at <192.168.2.140:38494>
9/2 20:20:34 [7633] GAHP server pid = 7634
9/2 20:20:35 [7274] Resources down for 298 seconds!
9/2 20:20:37 [7633] DaemonCore: Command received via UDP from host
<192.168.2.140:32795>
9/2 20:20:37 [7633] DaemonCore: received command 60000
(DC_RAISESIGNAL), calling handler (HandleSigCommand())
9/2 20:20:37 [7274] resource pc-p31972.somedomain.com:2119 is still
down
9/2 20:20:37 [7633] Found job 8.0 --- inserting
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:20:37 [7633] (8.0) proxy not cached yet, waiting...
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:20:37 [7633] resource pc-p31972.somedomain.com:2119 is now down
9/2 20:20:37 [7633] (8.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:21:19 [6843] Resources down for 778 seconds!
9/2 20:21:35 [7274] Resources down for 358 seconds!
9/2 20:21:35 [7633] Resources down for 58 seconds!
9/2 20:22:19 [6843] Resources down for 838 seconds!
9/2 20:22:35 [7274] Resources down for 418 seconds!
9/2 20:22:35 [7633] Resources down for 118 seconds!
<stuff deleted>
9/2 20:35:34 Using config file: /home/condor/condor_config
9/2 20:35:34 Using local config files:
/usr/local/condor/var/condor_config.local
9/2 20:35:34 DaemonCore: Command Socket at <192.168.2.140:38875>
9/2 20:35:34 [7916] GAHP server pid = 7917
9/2 20:35:35 [7633] Resources down for 898 seconds!
9/2 20:35:37 [7916] DaemonCore: Command received via UDP from host
<192.168.2.14 0:32797>
9/2 20:35:37 [7916] DaemonCore: received command 60000
(DC_RAISESIGNAL), calling handler (HandleSigCommand())
9/2 20:35:37 [7633] resource pc-p31972.somedomain.com:2119 is still
down
9/2 20:35:37 [7916] Found job 6.0 --- inserting
9/2 20:35:37 [7916] Found job 7.0 --- inserting
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (7.0) proxy not cached yet, waiting...
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) proxy not cached yet, waiting...
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] resource pc-p31972.somedomain.com:2119 is now down
9/2 20:35:37 [7916] (7.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:35:37 [7916] (6.0) doEvaluateState called: gmState GM_INIT,
globusState 32
9/2 20:36:19 [7682] Resources down for 778 seconds!
9/2 20:36:29 [7881] Resources down for 718 seconds!
9/2 20:36:29 [7883] Resources down for 718 seconds!
9/2 20:36:35 [7633] Resources down for more than 900 secs -- killing
GAHP
9/2 20:36:35 [7633] GAHP command 'RESULTS' failed
9/2 20:36:35 [7633] ERROR "Gahp Server (pid=7634) died due to signal 9"
at line 359 in file gahp-client.C
9/2 20:36:35 [7916] Resources down for 58 seconds!
Thanks i Advance........................