Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] shared fs high latency slow down the schedd
- Date: Tue, 20 Jun 2017 10:26:44 -0500
- From: Greg Thain <gthain@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] shared fs high latency slow down the schedd
On 06/20/2017 08:19 AM, Alessandro Italiano wrote:
Hi
we have a HTCondor cluster for local jobs submission which exploits a
shared filesystem.
Fundamentally, the schedd needs to write to the filesystem information
about work it is doing, and if the filesystem is slow, there isn't much
to do but wait.
However, I assume that not all of the data the schedd needs to store is
on the shared filesystem. If you can move more of the data onto a
local, or better-yet, SSD filesystem, this may help. The schedd writes
to the SPOOL directory frequently -- please make sure that SPOOL (i.e.
condor_config_val SPOOL) is on a local filesystem. Another type of file
that the schedd periodically writes to is the job log. If the job log
is on a shared filesystem, the schedd will get very slow. If you can
move those to a local filesystem, things should improve.
-greg