[HTCondor-users] Stumped by group_quota

Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

Greetings all,

We have a QA condor (8.4.6) system here with about 2 slots with more coming soon (yay!). They are both running CentOS 6.7. All users are on the same uid_domain and all servers are on the same filesystem_domain.Â

We would like to set up groups where group a gets a more resources than group b, which is more than group c, etc. I would also like to set up a set of group_accept_surplus rules to allow the unused slots to be reallocated, but would like to get this sorted out first.

Here is how I have defined the groups.

GROUP_NAMES= group_a, group_b, group_c, group_d, group_e, group_f, group_g

# The following is based on 7 groups, must add up to 1

# group_a gets 7x, group_g = 1x, totals

# 1/28x where x comes out to 0.0357142857142857

GROUP_QUOTA_DYNAMIC_group_a = 0.250

GROUP_QUOTA_DYNAMIC_group_b = 0.214

GROUP_QUOTA_DYNAMIC_group_c = 0.179

GROUP_QUOTA_DYNAMIC_group_d = 0.143

GROUP_QUOTA_DYNAMIC_group_e = 0.107

GROUP_QUOTA_DYNAMIC_group_f = 0.071

GROUP_QUOTA_DYNAMIC_group_g = 0.036

When I add the accounting_groupÂto the submit file, the job just hangs out in the Idle state. Here is the submit file:

# Unix submit description file

# sleep.sub -- simple sleep job

executable Â Â Â Â Â Â Â= sleep.sh

log Â Â Â Â Â Â Â Â Â Â = sleep.log

output Â Â Â Â Â Â Â Â Â= outfile.txt

error Â Â Â Â Â Â Â Â Â = errors.txt

accounting_group Â Â Â Â= group_a

should_transfer_files Â = Yes

when_to_transfer_output = ON_EXIT

queue

condor_q -better-analyze shows this:

[cyang@centos sleep]$ condor_q -better-analyze Â148.0

-- Schedd: centos.example.com : <10.2.7.151:9618?...

User priority for cyang@xxxxxxxxxxx is not available, attempting to analyze without it.

---

148.000: ÂRun analysis summary.Â Of 2 machines,

Â Â Â 0 are rejected by your job's requirements

Â Â Â 0 reject your job because of their own requirements

Â Â Â 0 match and are already running your jobs

Â Â Â 0 match but are serving other users

Â Â Â 0 are available to run your job

Â Â Â Â No successful match recorded.

Â Â Â Â Last failed match: Wed May Â4 09:48:46 2016

Â Â Â Â Reason for last match failure: no match found

The Requirements _expression_ for your job is:

Â Â ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) &&

Â Â ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&

Â Â ( TARGET.HasFileTransfer )

Your job defines the following attributes:

Â Â DiskUsage = 1

Â Â ImageSize = 1

Â Â RequestDisk = 1

Â Â RequestMemory = 1

The Requirements _expression_ for your job reduces to these conditions:

Â Â Â Â ÂSlots

Step Â ÂMatched ÂCondition

----- Â-------- Â---------

[0] Â Â Â Â Â 2 ÂTARGET.Arch == "X86_64"

[1] Â Â Â Â Â 2 ÂTARGET.OpSys == "LINUX"

[3] Â Â Â Â Â 2 ÂTARGET.Disk >= RequestDisk

[5] Â Â Â Â Â 2 ÂTARGET.Memory >= RequestMemory

[7] Â Â Â Â Â 2 ÂTARGET.HasFileTransfer

Suggestions:

Â Â Condition Â Â Â Â Â Â Â Â Â Â Â Â Machines Matched Â ÂSuggestion

Â Â --------- Â Â Â Â Â Â Â Â Â Â Â Â ---------------- Â Â----------

1 Â ( TARGET.Arch == "X86_64" ) Â Â Â 2

2 Â ( TARGET.OpSys == "LINUX" ) Â Â Â 2

3 Â ( TARGET.Disk >= 1 ) Â Â Â Â Â Â Â2

4 Â ( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) )

Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â 2

5 Â ( TARGET.HasFileTransfer ) Â Â Â Â2

So, it looks like the machines match, but yet it won't run. When I use the +AccountingGroup = group_a directive, it runs without any problem.Â

Additionally, condor_userprio shows just cyang@xxxxxxxxxxx with no associated groups.

[cyang@rhw1160 sleepjob]$ condor_userprio

Last Priority Update: Â5/4 Â09:51

Â Â Â Â Â Â Â Â Â Â Â Â Â Â ÂEffective Â Priority Â Res Â Total Usage ÂTime Since

User Name Â Â Â Â Â Â Â Â Â Â Priority Â ÂFactor Â In Use (wghted-hrs) Last Usage

--------------------------- ------------ --------- ------ ------------ ----------

cyang@xxxxxxxxxxx Â Â Â Â Â Â Â Â 507.02 Â 1000.00 Â Â Â0 Â Â Â Â 0.30 Â Â0+00:05

--------------------------- ------------ --------- ------ ------------ ----------

Number of users: 1 Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â Â0 Â Â Â Â 0.30 Â Â0+23:59

Any thoughts as to why the jobs are held? Or am I doing something obviously wrong?

Thanks.

Charles Yang

Senior Research Engineer

NOAA NESDIS/STAR

Mailing List Archives

Authenticated access

[HTCondor-users] Stumped by group_quota_dynamic