| 2009-12-29 *********************************************** * Hailong Yang, PhD. Candidate * Sino-German Joint Software Institute, * School of Computer Science&Engineering, Beihang University * Phone: (86-010)82315908 * Email: hailong.yang1115@xxxxxxxxx * Address: G413, New Main Building in Beihang University, * No.37 XueYuan Road,HaiDian District, * Beijing,P.R.China,100191 *********************************************** 发件人: 
hailong.yang1115 发送时间: 
2009-12-27 23:00:08 收件人: Alain Roy 抄送:  主题: Problems about condor 
slots Hi All, Recently we have installed the newest condor release version 7.4.1 in our 
clusters. We encountered the following problems on some nodes during the 
installation: 1. The slot number of some nodes in the condor pool mismatched the number 
of logic cpu cores, which could be seen from /proc/cpuinfo. The slot number of 
node9 we noticed from condor_status was 6, while the logic cpu cores we 
found from  /proc/cpuinfo is 4. Detailed information can be found in 
the attachment.  [root@monitor ~]# condor_status Name               OpSys      Arch   State     Activity LoadAv Mem   ActvtyTime slot1@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.000   505  0+09:45:54 slot1@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.140   505  0+00:50:04 slot1@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.000   168  0+22:46:51 slot2@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.000   505  0+00:15:05 slot2@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.000   505  1+00:50:43 slot2@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.000   168  0+01:10:05 slot3@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.000   168  1+01:10:33 slot4@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.000   168  1+01:10:34 slot5@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.000   168  1+01:10:35 slot6@xxxxxxxxxx   LINUX      INTEL  Unclaimed Idle     0.000   168  1+01:10:36 slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   493  0+23:29:41 slot2@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.040   493  0+01:10:05 slot3@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   493  1+01:11:00 slot4@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   493  1+01:11:01 slot5@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   493  1+01:11:02 slot6@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   493  1+01:11:03 slot7@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   493  1+01:11:04 slot8@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle     0.000   493  1+01:10:57                      Total Owner Claimed Unclaimed Matched Preempting Backfill          INTEL/LINUX    10     0       0        10       0          0        0         X86_64/LINUX     8     0       0         8       0          0        0                Total    18     0       0        18       0          0        0 2. After installed condor on some nodes, we started condor_master but 
nothing happened. We checked the MasterLog file, it gave the following 
error: 12/27 10:48:41 ****************************************************** 12/27 10:48:41 ** condor_master (CONDOR_MASTER) STARTING UP 12/27 10:48:41 ** /ddgrid/condor/sbin/condor_master 12/27 10:48:41 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) 12/27 10:48:41 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON 12/27 10:48:41 ** $CondorVersion: 7.4.1 Dec 17 2009 BuildID: 204351 $ 12/27 10:48:41 ** $CondorPlatform: I386-LINUX_RHEL3 $ 12/27 10:48:41 ** PID = 7012 12/27 10:48:41 ** Log last touched 12/26 23:53:14 12/27 10:48:41 ****************************************************** 12/27 10:48:41 Using config source: /ddgrid/condor/etc/condor_config 12/27 10:48:41 Using local config sources:  12/27 10:48:41    /ddgrid/condor/local.ddgrid/condor_config.local 12/27 10:48:41 ERROR "can't safe_open_wrapper(/tmp/condor-lock.ddgrid0.745993478763015/InstanceLock,O_WRONLY|O_CREAT|O_APPEND ,S_IRUSR|S_IWUSR) - errno 2" at line 946 in file master.cpp It seems there is some privilege problems with condor_config file, but we 
can not figure out which part is wrong.  [root@ddgrid local.ddgrid]# pwd /ddgrid/condor/local.ddgrid [root@ddgrid local.ddgrid]# ll total 4 -rw-r--r--  1 root   root 2918 Dec 26 23:36 condor_config.local drwxrwxrwt  2 condor root    6 Dec 26 23:36 execute drwxr-xr-x  2 condor root   22 Dec 26 23:40 log drwxr-xr-x  2 condor root    6 Dec 26 23:36 spool Best wishes! -Hailong 2009-12-27 
 *********************************************** * Hailong Yang, PhD. Candidate * Sino-German Joint Software Institute, * School of Computer Science&Engineering, Beihang University * Phone: (86-010)82315908 * Email: hailong.yang1115@xxxxxxxxx * Address: G413, New Main Building in Beihang University, * No.37 XueYuan Road,HaiDian District, * Beijing,P.R.China,100191 *********************************************** | 
Attachment:
cpuinfo
Description: Binary data