Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Idle jobs exist while resources are available

Date: Mon, 03 Oct 2016 11:35:52 -0400
From: Zhuo Zhang <zhuo.zhang@xxxxxxxx>
Subject: Re: [HTCondor-users] Idle jobs exist while resources are available

Hi Greg,

Thanks for the quick responses. I think your explanations sound reasonable. FYI, The command showed either 0 or 2 there.

Thank you again,

Zhuo

Greg Thain wrote on 10/3/2016 11:18 AM:

On 10/03/2016 10:05 AM, Zhuo Zhang wrote:

Hi,

We have a small test cluster with 5 machines. After submitting jobs to the test cluster, I always see 5 slots unclaimed even there are jobs waiting in the queue. I am not an administrator but a user, just suspect there are something wrong with our condor configurations. See the experiments I did below, can someone explain why there are jobs waiting while the resources are still available? Is the behavior expected?

3. But condor_status showed there are still 5 slots unclaimed.

rhw1193:25} condor_status
Name               OpSys      Arch   State     Activity LoadAv Mem     ActvtyTime

slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.140 146991 4+19:01:06
slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.020 146991 32+22:31:04
slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.060 146990 32+22:22:23
slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.830 146991 4+18:58:47
slot1@xxxxxxxxxxxx LINUX      X86_64 Unclaimed Idle      0.300 136879 18+17:51:51

If these five slots are the only Unclaimed ones, then your configuration is probably OK. These slots are probably partitionable slots, which means that they hold the "leftover" resources (cpu, memory, disk, etc.) that any running dynamic slots under them (the slot1_xx slots) are using. If you run

condor_status -af Name Cpus

condor will tell you how many cpus are in each slot, and I bet that the partitionable slots have exhausted their cpus, and there are 0 there.

-greg
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

References:
- [HTCondor-users] Idle jobs exist while resources are available
  - From: Zhuo Zhang
- Re: [HTCondor-users] Idle jobs exist while resources are available
  - From: Greg Thain

Prev by Date: Re: [HTCondor-users] Idle jobs exist while resources are available
Next by Date: Re: [HTCondor-users] MAX_JOBS_PER_OWNER
Previous by thread: Re: [HTCondor-users] Idle jobs exist while resources are available
Next by thread: [HTCondor-users] Problem with cleaning condor user profile
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Idle jobs exist while resources are available