[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-devel] -better-analyze and RequestMemory
- Date: Mon, 04 Oct 2010 19:45:05 +0100
- From: David McBride <dwm@xxxxxxxxxxxx>
- Subject: Re: [Condor-devel] -better-analyze and RequestMemory
On 04/10/10 17:41, Matthew Farrellee wrote:
I can refute that; I've seen exactly this behaviour on our 7.4.2
Linux-only cluster.
>
Great. Please include info about what you're seeing in the ticket.
Appending remarks to the ticket apparently requires an account, which I don't
have. So I'll just write out what I would have put in that ticket here, and
hope someone local copies the relevant bits as required.
So, this originated from a case in August when an end-user was attempting to
run a large number of jobs, but none of them were executing. Here was the
output of -better-analyze on one of them:
-- Schedd: midnight.doc.ic.ac.uk : <146.169.5.91:43500>
---
600.000: Run analysis summary. Of 438 machines,
438 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
0 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 match but are currently offline
0 are available to run your job
No successful match recorded.
Last failed match: Sun Aug 22 18:27:55 2010
Reason for last match failure: no match found
WARNING: Be advised:
No resources matched request's constraints
The Requirements expression for your job is:
( ( target.DoC_OS_Distribution == target.Ubuntu ) ) &&
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( ( target.Memory * 1024 ) >=
ImageSize ) &&
( ( RequestMemory * 1024 ) >= ImageSize ) ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( ( target.DoC_OS_Distribution == target.Ubuntu ) )
0 REMOVE
2 ( ( ( 1024 * target.Memory ) >= 0 ) && ( ( 1024 *
ceiling(ifThenElse(JobVMMemory isnt undefined,JobVMMemory,0.0)) ) >= 0 )
)
0 REMOVE
3 ( target.Arch == "INTEL" ) 260
4 ( target.OpSys == "LINUX" ) 438
5 ( target.Disk >= 0 ) 438
6 ( TARGET.FileSystemDomain == "doc.ic.ac.uk" )438
(Note that DoC_OS_Distribution is a site-local machine value populated via a
Hawkeye script, and contains the output of `lsb_release -i -s`.)
I'll copy what I wrote in the ticket to the user:
As the analysis suggests, the first two conditions on the list are
preventing your jobs from being matched.
In the case of the first rule, you have missed off the quotation marks
around the term, 'Ubuntu'. As a result, rather than trying to check the
'DoC_OS_Distribution' string against that constant string, it's instead
trying to match it against the value of the 'Ubuntu' field on a
candidate machine -- which isn't what you meant. Alter the Requirements
field to put double-quotes around the literal string, "Ubuntu".
I _believe_ the suggestion being produced for the second rule is
erroneous, and no futher changes are required -- simply fixing the
quoting of "Ubuntu" *should* be sufficient.
And indeed, once the user modified their 'Requirements' line to check against
the literal string, "Ubuntu", rather than the machine ClassAd value,
target.Ubuntu, everything worked for the user as expected, and their jobs ran.
Our Condor pool contains only Linux hosts; Windows is not involved.
`condor_version` on a common machine reports:
$CondorVersion: 7.4.2 Mar 30 2010 BuildID: 227044 $
$CondorPlatform: X86_64-LINUX_DEBIAN50 $
From my perspective, this is purely a presentation issue -- the actual
matching behavior of the various Condor services appears to be operating as
designed; it's just the analysis code which isn't fully evaluating the ClassAd
expressions completely/correctly. The notion that this issue is preventing
jobs to match is, at least in my case, a red herring.
Cheers,
David
--
David McBride <dwm@xxxxxxxxxxxx>
Department of Computing, Imperial College, London