Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [Condor-users] Flocking - jobs matched but not started
- Date: Fri, 19 Aug 2005 17:20:28 +0100
- From: John Horne <john.horne@xxxxxxxxxxxxxx>
- Subject: RE: [Condor-users] Flocking - jobs matched but not started
On Fri, 2005-08-19 at 10:57 -0500, Michael Rusch wrote:
> For what it's worth, I have had what sounds like a similar problem for quite
> awhile, though it has been much harder for me to debug, since I don't have
> access to logs on the flocked-to pool. Out of curiosity, when your jobs
> "match" but don't run, are they still listed as idle in the queue?
>
Yes, as a snippet of condor_q shows:
-- Submitter: ws-60-56.dhcp.plymouth.ac.uk : <141.163.60.56:44957> :
ws-60-56.dhcp.plymouth.ac.uk
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
48.0 john 8/18 18:05 0+00:17:26 I 0 1.6 loop.remote 200
>
> When you condor_q -analyze, are they shown as having machines that are available to
> run the job? I'm trying to figure out if this is the same problem, in which
> case I may have 2 cents to put in...
>
No I don't see that. condor -q shows:
==============================================================
[root@ws-60-56 log]# condor_q -analyze 48.0
-- Submitter: ws-60-56.dhcp.plymouth.ac.uk : <141.163.60.56:44957> :
ws-60-56.dhcp.plymouth.ac.uk
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
---
048.000: Run analysis summary. Of 0 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match, but are serving users with a better priority in the pool
0 match, match, but reject the job for unknown reasons
0 match, but will not currently preempt their existing job
0 are available to run your job
Last successful match: Fri Aug 19 17:13:00 2005
Last failed match: Fri Aug 19 17:15:00 2005
Reason for last match failure: no match found
WARNING: Be advised:
No resources matched request's constraints
Check the Requirements expression below:
Requirements = (Arch == "INTEL") && (OpSys == "LINUX") && ((CkptArch ==
Arch) || (CkptArch =?= UNDEFINED)) && ((CkptOpSys == OpSys) ||
(CkptOpSys =?= UNDEFINED)) && (Disk >= DiskUsage) && ((Memory * 1024) >=
ImageSize)
WARNING: Be advised: Request 48.0 did not match any resource's
constraints
==============================================================
However, this just will be picked up on the remote server and matched
with a client in it's pool. So I think the 'condor_q -analyze' is a bit
misleading here as it seems to show a job which is having a problem
running. Having said that though, the condor_q command is looking at the
job and seeing if it can run locally (which it can't). In my case I have
stopped startd so it won't run but must be flocked.
John.
--
---------------------------------------------------------------
John Horne, University of Plymouth, UK Tel: +44 (0)1752 233914
E-mail: John.Horne@xxxxxxxxxxxxxx Fax: +44 (0)1752 233839