Subject: Re: [HTCondor-users] control law questions
> From: "Krieger, Donald N." <kriegerd@xxxxxxxx> > Date: 12/04/2015 06:41 AM >
> Dear List: > > I have two questions about the algorithm which
is used to move jobs from
> the âIâ state in the run state. > > Does the algorithm take into account the amount
of time requested? > For instance do sites specify up front the maximum
amount of time that a
> job they will accept can request? > If so, can a user get a survey of those times
from all the sites currently
> accepting jobs? > > Below is a 24-hour plot showing the number of
jobs running through the xd-
> login submit host on the Open Science Grid. > During this period most of the opportunistic
cycles were shared relatively
> equally between 3 users, all running through xd-login. > The black tracing is the total. It is a
count of the number condor_shadowprocesses. > The blue tracing is the number of my running
jobs. It is obtained from a
> condor_q command. > There is an oscillation in the blue tracing with
a period of about 90
> minutes which is quite large. > I presume that the other users saw a comparable
oscillation and I have
> seen this behavior repeatedly. > Is there something out there which analyzes the
behavior of the control
> algorithm implemented in HTCondor? > I have reviewed the documentation on the algorithm
itself and admit that I
> do not understand it. > Any comments on this would be welcome.
Hey Don,
My experience may not be directly relevant to this
situation, but when I started adding opportunistic resources to one of my
pools in the form of desktop systems running Linux, I noticed a key difference
between them and the dedicated (i.e., "START=True") resources.
Checking the OS statistics graphs, I saw the load
average of the opportunistic machines oscillating from 0 to 12 and
back again, with similar ripples in the network traffic, all night and all
weekend.
Since the desktops generally don't have the spiffy
network offload features of my dedicated machines, when things started cranking
in earnest on them the non-HTCondor load average promptly rose to around
1, as a dozen jobs on a high-end desktop workstation went about their business
fetching inputs, consulting remote data, and writing outputs. The kernel
was having to segment/fragment/checksum a much higher level of network
traffic than usual which led to longer delays in the kernel's scheduler
for non-HTCondor processes and a rising load average.
With the default configuration, a non-HTCondor load
average higher than 0.3 puts the machine back in Owner state causing it
to stop accepting additional jobs, and with the minimum idle time constraint
it stayed there for 15 minutes or more.
As a result, we saw the same kind of oscillation in
the graphs, as desktop workstations flipped in and out of Owner/Idle
state. A desktop would accept 6, 8, or 12 jobs, go into owner due to
1.0 > 0.3, drain out most or all of the jobs, then finally go
back into Unclaimed 15 or more minutes after the loadav dropped below
0.3 to repeat the cycle again; instead of finishing one of the 12
jobs and starting another to keep 12 jobs running all night and all
weekend.
The problem was even more visible when the machines
were running short-duration jobs, where the time HTCondor must
wait before moving from Owner to Unclaimed is significantly longer than
the job duration.
First, I tried increasing the non-HTCondor load average
limit from 0.3 to 1.0, and that helped quite a bit, but it's not
a complete solution if you want to keep machine owners happy. It needs
to be coupled with some expressions for more aggressive suspension and
eviction of jobs based on the keyboard and mouse activity - and you need
to be sure that kbdd is working correctly with respect to both KeyboardIdle
and ConsoleIdle.
In addition, I reduced the number of CPUs advertised
by desktop systems by one, to leave some room for network processing
in the kernel rather than the interface. That could be coupled to a probe of
the interface using "ethtool --show-offload," come to think
of it.