2009-12-29
*********************************************** * Hailong Yang, PhD. Candidate * Sino-German Joint Software Institute, * School of Computer Science&Engineering, Beihang University * Phone: (86-010)82315908 * Email: hailong.yang1115@xxxxxxxxx * Address: G413, New Main Building in Beihang University, * No.37 XueYuan Road,HaiDian District, * Beijing,P.R.China,100191 *********************************************** 发件人:
hailong.yang1115
发送时间:
2009-12-27 23:00:08
收件人: Alain Roy
抄送:
主题: Problems about condor
slots
Hi All,
Recently we have installed the newest condor release version 7.4.1 in our
clusters. We encountered the following problems on some nodes during the
installation:
1. The slot number of some nodes in the condor pool mismatched the number
of logic cpu cores, which could be seen from /proc/cpuinfo. The slot number of
node9 we noticed from condor_status was 6, while the logic cpu cores we
found from /proc/cpuinfo is 4. Detailed information can be found in
the attachment.
[root@monitor ~]# condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
slot1@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 505 0+09:45:54
slot1@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.140 505 0+00:50:04
slot1@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 168 0+22:46:51
slot2@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 505 0+00:15:05
slot2@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 505 1+00:50:43
slot2@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 168 0+01:10:05
slot3@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 168 1+01:10:33
slot4@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 168 1+01:10:34
slot5@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 168 1+01:10:35
slot6@xxxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 168 1+01:10:36
slot1@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 493 0+23:29:41
slot2@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.040 493 0+01:10:05
slot3@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 493 1+01:11:00
slot4@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 493 1+01:11:01
slot5@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 493 1+01:11:02
slot6@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 493 1+01:11:03
slot7@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 493 1+01:11:04
slot8@xxxxxxxxxxxx LINUX X86_64 Unclaimed Idle 0.000 493 1+01:10:57
Total Owner Claimed Unclaimed Matched Preempting Backfill
INTEL/LINUX 10 0 0 10 0 0 0
X86_64/LINUX 8 0 0 8 0 0 0
Total 18 0 0 18 0 0 0 2. After installed condor on some nodes, we started condor_master but
nothing happened. We checked the MasterLog file, it gave the following
error:
12/27 10:48:41 ******************************************************
12/27 10:48:41 ** condor_master (CONDOR_MASTER) STARTING UP
12/27 10:48:41 ** /ddgrid/condor/sbin/condor_master
12/27 10:48:41 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1)
12/27 10:48:41 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON
12/27 10:48:41 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $
12/27 10:48:41 ** $CondorPlatform: I386-LINUX_RHEL3 $
12/27 10:48:41 ** PID = 7012
12/27 10:48:41 ** Log last touched 12/26 23:53:14
12/27 10:48:41 ******************************************************
12/27 10:48:41 Using config source: /ddgrid/condor/etc/condor_config
12/27 10:48:41 Using local config sources:
12/27 10:48:41 /ddgrid/condor/local.ddgrid/condor_config.local
12/27 10:48:41 ERROR "can't safe_open_wrapper(/tmp/condor-lock.ddgrid0.745993478763015/InstanceLock,O_WRONLY|O_CREAT|O_APPEND
,S_IRUSR|S_IWUSR) - errno 2" at line 946 in file master.cpp It seems there is some privilege problems with condor_config file, but we
can not figure out which part is wrong.
[root@ddgrid local.ddgrid]# pwd
/ddgrid/condor/local.ddgrid [root@ddgrid local.ddgrid]# ll
total 4
-rw-r--r-- 1 root root 2918 Dec 26 23:36 condor_config.local
drwxrwxrwt 2 condor root 6 Dec 26 23:36 execute
drwxr-xr-x 2 condor root 22 Dec 26 23:40 log
drwxr-xr-x 2 condor root 6 Dec 26 23:36 spool
Best wishes!
-Hailong
2009-12-27
*********************************************** * Hailong Yang, PhD. Candidate * Sino-German Joint Software Institute, * School of Computer Science&Engineering, Beihang University * Phone: (86-010)82315908 * Email: hailong.yang1115@xxxxxxxxx * Address: G413, New Main Building in Beihang University, * No.37 XueYuan Road,HaiDian District, * Beijing,P.R.China,100191 *********************************************** |
Attachment:
cpuinfo
Description: Binary data