Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] CPU Core detection?
- Date: Mon, 01 Aug 2011 14:42:13 -0400
- From: Matthew Farrellee <matt@xxxxxxxxxx>
- Subject: Re: [Condor-users] CPU Core detection?
On 08/01/2011 02:28 PM, Michael Di Domenico wrote:
How does condor determine how many cores are in a box?
I'm running into an issue on RHEL using Condor 7.4 and Magny-Cours 12-core cpus
You can see this kernel thread
http://groups.google.com/group/linux.kernel/browse_thread/thread/3f3904516374e62c?pli=1
I'm looking into the kernel issue presently, but i'm curious what work
in condor has been done around this?
I suspect since my 48-core boxes are showing up as 24-slot machines
with COUNT_HYPERTHREADED_CPUS=false, that condor is mis-reading the
/proc/cpuinfo file instead of looking in the /sys files, but i'd like
some confirmation
thanks
You could try reading this code (mind the #ifdefs 8o)...
http://condor-git.cs.wisc.edu/?p=condor.git;a=blob;f=src/condor_sysapi/ncpus.cpp;h=7d65306516f2cb7c9b50b4d9cd8825df60c39610;hb=master
Or watch it in action...
$ _CONDOR_TOOL_DEBUG=D_ALL condor_config_val -debug
08/01/11 14:37:49 (fd:2) (pid:17500) config: using subsystem 'TOOL',
local ''
08/01/11 14:37:49 (fd:2) (pid:17500) Reading from /proc/cpuinfo
08/01/11 14:37:49 (fd:2) (pid:17500) Found: Physical-IDs:True; Core-IDs:True
08/01/11 14:37:49 (fd:2) (pid:17500) Analyzing 4 processors using IDs...
08/01/11 14:37:49 (fd:2) (pid:17500) Looking at processor #0 (PID:0, CID:0):
08/01/11 14:37:49 (fd:2) (pid:17500) Comparing P#0 and P#1 : pid:0==0
and cid:0==0 (match=2)
08/01/11 14:37:49 (fd:2) (pid:17500) Comparing P#0 and P#2 : pid:0!=0
or cid:0!=2 (match=No)
08/01/11 14:37:49 (fd:2) (pid:17500) Comparing P#0 and P#3 : pid:0!=0
or cid:0!=2 (match=No)
08/01/11 14:37:49 (fd:2) (pid:17500) ncpus = 1
08/01/11 14:37:49 (fd:2) (pid:17500) P0: match->2
08/01/11 14:37:49 (fd:2) (pid:17500) P1: match->2
08/01/11 14:37:49 (fd:2) (pid:17500) Looking at processor #1 (PID:0, CID:0):
08/01/11 14:37:49 (fd:2) (pid:17500) Looking at processor #2 (PID:0, CID:2):
08/01/11 14:37:49 (fd:2) (pid:17500) Comparing P#2 and P#3 : pid:0==0
and cid:2==2 (match=2)
08/01/11 14:37:49 (fd:2) (pid:17500) ncpus = 2
08/01/11 14:37:49 (fd:2) (pid:17500) P2: match->2
08/01/11 14:37:49 (fd:2) (pid:17500) P3: match->2
08/01/11 14:37:49 (fd:2) (pid:17500) Looking at processor #3 (PID:0, CID:2):
08/01/11 14:37:49 (fd:2) (pid:17500) Using IDs: 4 processors, 2 CPUs, 2 HTs
08/01/11 14:37:49 (fd:2) (pid:17500) Reading condor configuration from
'/etc/condor/condor_config'
08/01/11 14:37:49 (fd:2) (pid:17500) Finding local host information,
calling gethostname()
08/01/11 14:37:49 (fd:2) (pid:17500) gethostname() returned fully
qualified name "eeyore.local"
08/01/11 14:37:49 (fd:2) (pid:17500) Trying to initialize local IP
address (config file not read)
08/01/11 14:37:49 (fd:2) (pid:17500) NETWORK_INTERFACE=* matches lo
127.0.0.1, wlan0 10.10.30.140, virbr0 192.168.122.1, tun0 10.3.227.184,
choosing IP 10.10.30.140
08/01/11 14:37:49 (fd:2) (pid:17500) Trying to initialize local IP
address (after reading config)
08/01/11 14:37:49 (fd:2) (pid:17500) Disabling
ConvertDefaultIPToSocketIP() because NETWORK_INTERFACE does not match
multiple IPs.
Usage: condor_config_val [options] variable [variable] ...
Best,
matt