Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Job Runtime Problem

Date: Tue, 16 Jul 2013 13:48:49 -0400
From: Matthew Farrellee <matt@xxxxxxxxxx>
Subject: Re: [HTCondor-users] Job Runtime Problem

On 07/16/2013 01:23 PM, Vishal Shah wrote:

Hello,

When submitting N instances of a job, generally N/2 jobs run in the
expected time and the other N/2 jobs take longer to complete. The system
has 10 nodes each with 32 slots and uses a shared filesystem
(GlusterFS). All of the executables and data files are located on the
shared file system; however, the problem does not seem to be an I/O or
network bottleneck.

When submitting 2 instances, the two times are the following:

Instance 1
real7m13.950s
user5m36.766s
sys0m14.436s

Instance 2
real6m2.555s
user5m35.747s
sys0m13.170s

When submitting 22 instances, the difference in times are more drastic.
The two categories that the times fall into are the following:

Category 1:
real18m28.193s
user5m39.153s
sys0m15.111s

Category 2:
real6m12.578s
user5m36.433s
sys0m12.644s

Does anybody have insight into this issue?

Thanks,
Vishal


Share your goal, so we can tell what the issue may be.

FYI, condor_submit <-> condor_schedd communication is very chatty andthe condor_schedd is single threaded. The schedd may have ignored yoursubmit for a period while doing some job maintenance, which resulted ina 18min runtime for submit.


Best,


matt

Follow-Ups:
- Re: [HTCondor-users] Job Runtime Problem
  - From: Vishal Shah

References:
- [HTCondor-users] Job Runtime Problem
  - From: Vishal Shah

Prev by Date: [HTCondor-users] Job Runtime Problem
Next by Date: Re: [HTCondor-users] Job Runtime Problem
Previous by thread: [HTCondor-users] Job Runtime Problem
Next by thread: Re: [HTCondor-users] Job Runtime Problem
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Job Runtime Problem