[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] ERROR PLZZ HELP OUT Could not fetch ads --- can't find collector
- Date: Wed, 8 Mar 2006 11:40:32 +0530
- From: "kailash raj" <askailash@xxxxxxxxx>
- Subject: [Condor-users] ERROR PLZZ HELP OUT Could not fetch ads --- can't find collector
Lot of Thanks to ALAIN ROY , Ive follwed your sugestion , its almost recovered still, i have some probs
In both the client and head node i dont have startd running, even if i start condor_startd manually , its not running,
also see the status of the commands,
IN HEAD NODE
[root@ca1 ~]# ps -ef | egrep condor_
condor 4567 1 0 11:03
? 00:00:00 condor_master
condor 4569 4567 0 11:03 ? 00:00:00 condor_schedd -f
condor 4594 1 0 11:03
? 00:00:00 condor_collector
condor 4603 1 0 11:03
? 00:00:00 condor_negotiator
condor 4619 1 0 11:04
? 00:00:00 condor_schedd
root 4658 3773 0 11:04 pts/1 00:00:00 egrep condor_
[root@ca1 ~]# condor_q
-- Submitter: ca1.cdacgrid : <192.9.200.215:33073> : ca1.cdacgrid
ID
OWNER
SUBMITTED RUN_TIME ST PRI SIZE CMD
0 jobs; 0 idle, 0 running, 0 held
[root@ca1 ~]# condor_status
Error: Could not fetch ads --- can't find collector
ALSO IN THE HEAD NODE Im NOT GETTING THE DETAILS OF THE CLIENT NODE,
MASTERLOG OF HEADNODE
3/8 11:04:12 The STARTD (pid 4622) exited with status 4
3/8 11:04:12 restarting /usr/local/condor/sbin/condor_startd in 25 seconds
3/8 11:04:12 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:04:37 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4678
3/8 11:04:37 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:04:37 The STARTD (pid 4678) exited with status 4
3/8 11:04:37 restarting /usr/local/condor/sbin/condor_startd in 41 seconds
3/8 11:04:37 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:05:18 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4681
3/8 11:05:18 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:05:18 The STARTD (pid 4681) exited with status 4
3/8 11:05:18 restarting /usr/local/condor/sbin/condor_startd in 73 seconds
3/8 11:05:18 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:06:31 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4683
3/8 11:06:31 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:06:31 The STARTD (pid 4683) exited with status 4
3/8 11:06:31 restarting /usr/local/condor/sbin/condor_startd in 137 seconds
3/8 11:06:31 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:08:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:08:48 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4686
3/8 11:08:48 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:08:48 The STARTD (pid 4686) exited with status 4
3/8 11:08:48 restarting /usr/local/condor/sbin/condor_startd in 265 seconds
3/8 11:08:48 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:13:13 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4721
3/8 11:13:13 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:13:13 The STARTD (pid 4721) exited with status 4
3/8 11:13:13 restarting /usr/local/condor/sbin/condor_startd in 521 seconds
3/8 11:13:13 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:13:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:18:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:21:54 Started DaemonCore process "/usr/local/condor/sbin/condor_startd", pid and pgroup = 4761
3/8 11:21:54 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:21:54 The STARTD (pid 4761) exited with status 4
3/8 11:21:54 restarting /usr/local/condor/sbin/condor_startd in 1033 seconds
3/8 11:21:54 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:23:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:28:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
3/8 11:33:26 Can't send UPDATE_MASTER_AD to collector : Failed to connect to collector
THERS NO ERROR IN COLLECTORLOG
IN CLIENTNODE
[root@nodeA sbin]# ps -ef | egrep condor_
condor 4169 1 0 11:04
? 00:00:00 condor_master
condor 4171 4169 0 11:04 ? 00:00:00 condor_schedd -f
root 4282 3924 0 11:06 pts/1 00:00:00 egrep condor_
[root@nodeA sbin]# condor_q
-- Submitter: nodeA.cdacgrid : <192.9.200.90:32774> : nodeA.cdacgrid
ID
OWNER
SUBMITTED RUN_TIME ST PRI SIZE CMD
0 jobs; 0 idle, 0 running, 0 held
[root@nodeA sbin]# condor_status
Error: Could not fetch ads --- can't find collector
________________________________________________________________________________________________
shall i check condor_status after submitting job , moreover Im not getting status of clientnode in headnode,
plzz help out,
thanks,
lash