Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Slow Performance
- Date: Sun, 27 Apr 2014 12:52:26 -0500
- From: Todd Tannenbaum <tannenba@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Slow Performance
On 4/27/2014 10:22 AM, Dennis Zheleznyak wrote:
My question may not be connected directly to Condor but I'd like to know if
anyone encountered the same issue as me.
I bought a a Dell 720xdserver with an x2 Intel E5-2660 v2 CPUs, 256GB DDR3
and 40TB of data that has a RAID6 over it. with HyperThreading it has 40
cores. It has Windows Server 2012 on it.
My program isn't build with MPI capabilities, it calculates data from an
input file and outputs to a file once it is done - the program was compiled
with MatLab.
Normally I have 150 sets of data to be caulates. When I send it to condor
40 jobs start and that's great - the problem is that it takes forever to
finish a even one simple little job! The CPU is constantly working at 100%,
the memory barely gets to 10% and there is no special IO on the disks that
I can mention.
Before I bought the server, I had 4 computers with i7 4770K Haswell and
16GB of memory - the jobs literary flew when I sent it to my condor pool !
I don't know what to check or do - if anyone has any idea I would
appreciate it.
Thank you,
Dennis.
A few random first-thought suggestions:
1. Are you compiling and running with the -singleCompThread command-line
argument to MATLAB? From how you have things setup about, you will want
-singleCompThread so that MATLAB only uses a single core, else MATLAB
will startup and each of your 40 jobs will try to use all 40 cores!
Even if this was happening with your old servers, the issue will become
much more pronounced on a machine with more cores. See
https://htcondor-wiki.cs.wisc.edu/index.cgi/wiki?p=HowToRunMatlab
for this and other tips.
2. I would suggest a quick experiment - try running without
hyperthreading and see if that improves things. Even if it doesn't, at
least you eliminated a possible issue. To do so, in the
condor_config.local for that machine set
COUNT_HYPERTHREAD_CPUS = False
and then restart HTCondor. Specifically, you just need to restart the
condor_startd, so you could do
condor_restart -startd <machine-name>
from your central manager. When HTCondor restarts, you will see less
slots as HTCondor will only count physical cores, not hyperthread cores.
Resubmit your jobs and see what happens.
3. Another suggestion - if it is easy, what happens if you start 40 runs
of your job simultaneously outside of HTCondor? We expect things will
be equally slow outside of HTCondor, but it would be a nice data point
to confirm this.
regards,
Todd