[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] shared fs high latency slow down the schedd

Date: Tue, 20 Jun 2017 15:19:08 +0200
From: Alessandro Italiano <alessandro.italiano@xxxxxxxxxx>
Subject: [HTCondor-users] shared fs high latency slow down the schedd

we have a HTCondor cluster for local jobs submission which exploits a shared filesystem.

***

[root@ettore ~]# condor_schedd -version

$CondorVersion: 8.4.6 Apr 20 2016 BuildID: 364106 $

$CondorPlatform: x86_64_RedHat6 $

[root@ettore ~]#

[root@ettore ~]# condor_config_val -dump FILESYSTEM_DOMAIN

FILESYSTEM_DOMAIN = GPFS

[italiano@ui02 ~]$ condor_config_val -dump FILESYSTEM_DOMAIN

FILESYSTEM_DOMAIN = GPFS

***

Everytime the filesystem experiences high latency while accessing files for example during a restripe operation on the file system, the schedd serving the local job submission hangs. In this status a condor_reconfig takes several minute di be applied.

So, it seems that the slow filesystem performance negatively affects the schedd response time and sometime it also becomes unresponsive

***

[root@ettore ~]# condor_q

-- Failed to fetch ads from: <90.147.169.224:38705> : ettore.recas.ba.infn.it

SECMAN:2007:Failed to end classad message.

[root@ettore ~]#

***

Is there a way to preserve schedd functionalities during such situations ?

In the same cluster there are also other schedds serving grid jobs which are NOT affected by this behaviour.

thanks in advance for any hint you would like to share

Ale

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Follow-Ups:
- Re: [HTCondor-users] shared fs high latency slow down the schedd
  - From: Greg Thain

Prev by Date: [HTCondor-users] 16th World Conference on Mobile and Contextual Learning (mLearn 2017): Fifth Call for Papers
Next by Date: [HTCondor-users] Final Call for Papers: CICLOPS 2017 - 15th International Colloquium on Implementation of Constraint and LOgic Programming Systems (DEADLINES EXTENSION)
Previous by thread: [HTCondor-users] 16th World Conference on Mobile and Contextual Learning (mLearn 2017): Fifth Call for Papers
Next by thread: Re: [HTCondor-users] shared fs high latency slow down the schedd
Index(es):
- Date
- Thread