Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] How to solve/debug no matching?
- Date: Tue, 30 Dec 2008 14:15:06 +0100
- From: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
- Subject: [Condor-users] How to solve/debug no matching?
Hi all,
the current status of our pool here is pretty much idle:
condor_status |tail -n 6
Total Owner Claimed Unclaimed Matched Preempting
Backfill
X86_64/LINUX 3813 0 66 0 0 0
3747
Total 3813 0 66 0 0 0
3747
However, looking into a job which is idle and rejected according to the
MatchLog shows this (numbers are bigger now since I just restarted the
submit machines and they are getting to know all compute nodes again):
condor_q -bet 8487997.0
-- Quill: atlasquill : <10.20.30.1:5432> : atlasquill
---
8487997.000: Run analysis summary. Of 4563 machines,
366 are rejected by your job's requirements
0 reject your job because of their own requirements
38 match but are serving users with a better priority in the pool
4112 match but reject the job for unknown reasons
47 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Mon Dec 22 14:42:11 2008
The Requirements expression for your job is:
( target.Arch == "X86_64" ) && ( target.OpSys == "LINUX" ) &&
( ( CkptArch == target.Arch ) || ( CkptArch is undefined ) ) &&
( ( CkptOpSys == target.OpSys ) || ( CkptOpSys is undefined ) ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( ( 1024 * target.Memory ) >= 300000 )4197
2 ( target.Arch == "X86_64" ) 4563
3 ( target.OpSys == "LINUX" ) 4563
4 ( ( "X86_64" == target.Arch ) ) 4563
5 ( ( "LINUX" == target.OpSys ) ) 4563
6 ( target.Disk >= 7500 ) 4563
According to this list, this job should be sent to the cluster right
away, however it stayed idle over Xmas :(
System setup currently is: two submit machines with HA setup, running
quill on postgresql, condor version is 7.0.5 with 7.1.4 dagman binaries
Any hint how I can debug this further to narrow down why it does not work?
Cheers
Carsten