Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Matchmaking priority issue
- Date: Thu, 31 Jul 2014 21:06:55 +0200
- From: Szabolcs Horvátth <szabolcs@xxxxxxxxxxxxx>
- Subject: [HTCondor-users] Matchmaking priority issue
Hi,
I encounter a weird matchmaking situation sometimes that I don't understand:
- A single user submits multiple jobs into the queue.
- The priority of some earlier submitted jobs is raised later, to force them to execute first. The priority is surely higher than any other jobs in the queue.
- Both machine rank and job priority is higher for these jobs, there are available slots (specified by the concurrency limits) but still, the earlier jobs get executed. For hours.
- The only way I can force the older jobs to execute is to set the attributes to force a higher machine rank.
Which attribute (or attributes) might cause this behaviour? What can I do to solve it or where to look to debug whats going wrong?
(Using Condor 8.1.4)
By the way I suspect that its only happening when the jobs are limited by concurrency limits.
Cheers,
Szabolcs
ps. The result of better-analyzing a job:
---
20858347.000: Run analysis summary. Of 214 machines,
ÂÂÂÂ 105 are rejected by your job's requirements
ÂÂÂÂÂ 82 reject your job because of their own requirements
ÂÂÂÂÂÂ 0 match and are already running your jobs
ÂÂÂÂÂÂ 7 match but are serving other users
ÂÂÂÂÂ 20 are available to run your job
The Requirements expression for your job is:
ÂÂÂÂ ( ( HAS_ABCD is true ) && ( OpSys == "LINUX" && Arch == "X86_64" &&
ÂÂÂÂÂÂÂÂ Memory > 1024 ) && ( Name isnt LastRemoteHost ) ) &&
ÂÂÂÂ ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
ÂÂÂÂ ( TARGET.HasFileTransfer )
Your job defines the following attributes:
ÂÂÂÂ DiskUsage = 7
ÂÂÂÂ ImageSize = 7
ÂÂÂÂ RequestDisk = 7
ÂÂÂÂ RequestMemory = 1
The Requirements expression for your job reduces to these conditions:
ÂÂÂÂÂÂÂÂÂ Slots
Step Matched Condition
-----Â --------Â ---------
[0]ÂÂÂÂÂÂÂÂ 109Â HAS_ABCD is true
[1]ÂÂÂÂÂÂÂÂ 214Â OpSys == "LINUX"
[2]ÂÂÂÂÂÂÂÂ 214Â Arch == "X86_64"
[4]ÂÂÂÂÂÂÂÂ 214Â Memory > 1024
Suggestions:
ÂÂÂÂ ConditionÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ Machines MatchedÂÂÂ Suggestion
ÂÂÂÂ ---------ÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂÂ ----------------ÂÂÂ ----------
1ÂÂ ( target.HAS_ABCD is true )ÂÂÂÂÂÂ 109
2ÂÂ ( target.OpSys == "LINUX" && target.Arch == "X86_64" && target.Memory > 1024 ) 214
3ÂÂ ( target.Name isnt target.LastRemoteHost )214
4ÂÂ ( TARGET.Disk >= 7 )ÂÂÂÂÂÂÂÂÂÂÂÂÂ 214
5ÂÂ ( TARGET.Memory >= ifthenelse(MemoryUsage isnt undefined,MemoryUsage,1) ) 214
6ÂÂ ( TARGET.HasFileTransfer )ÂÂÂÂÂÂÂ 214
The following attributes are missing from the job ClassAd:
CheckpointPlatform
---