Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Condor-users] idling jobs
- Date: Fri, 13 Apr 2007 11:39:18 -0500
- From: Daniel Goldin <daniel.goldin@xxxxxxx>
- Subject: Re: [Condor-users] idling jobs
Thanks, Nick, for your reply.
I've run 'condor_q -better' on the farm as you suggested. I'm attaching
the output. I'm not an expert enough to understand it though. Can you
glean something from it?
Thanks in advance,
Daniel
-----Original Message-----
From: condor-users-bounces@xxxxxxxxxxx on behalf of Nick LeRoy
Sent: Mon 4/9/2007 10:47 PM
To: Condor-Users Mail List
Subject: Re: [Condor-users] idling jobs
On Wed April 4 2007 11:19 am, Daniel Goldin wrote:
> Hi,
Hello,
> I have submitted 30 jobs to run on a farm with 30 nodes. The "submit"
> file looks like this:
<snip>
>
> I am the only user on the farm, but what I see is only 5-6 jobs are
> running simultaneously and the rest are idling. Can I reconfigure
> something so that all the jobs run simultaneously? Could it be a
> priority issue? (If it can be done, I'd like to do it non-intrusively,
> i.e. keep the running jobs running...)
There's not a lot of information here, and there could be quite a lot of
things going wrong.
First, have you waited at least one negotiation cycle (typically 5
minutes)?
I'm assuming that these are all long running jobs (from your description
above). Condor doesn't do particularly well when users submit a lot of
short
running jobs. If that's not the case, then let's try a couple debugging
exercises:
1. Have you looked at the output of 'condor_status' to verify that all
of the
execute machines are reporting to the pool correctly, and that they're
all in
the unclaimed / idle state?
2. Have you tried running 'condor_queue -analyze' or (even
better) 'condor_queue -better' (better analyze) and looked through it's
output?
I'd start with the above two exercises... If they don't help, give us
a
little more information to go on (like the output of condor_status and
condor_q or 'condor_q -ana').
Hope this helps
-Nick
--
<<< Follow the white rabbit. >>>
/`-_ Nicholas R. LeRoy The Condor Project
{ }/ http://www.cs.wisc.edu/~nleroy http://www.cs.wisc.edu/condor
\ / nleroy@xxxxxxxxxxx The University of Wisconsin
|_*_| 608-265-5761 Department of Computer Sciences
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx with
a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at either
https://lists.cs.wisc.edu/archive/condor-users/
http://www.opencondor.org/spaces/viewmailarchive.action?key=CONDOR
-- Submitter: smufarm.physics.smu.edu : <192.168.1.1:32780> : smufarm.physics.smu.edu
---
3731.000: Run analysis summary. Of 60 machines,
56 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Mon Apr 9 16:00:38 2007
The Requirements expression for your job is:
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( ( 1024 * target.Memory ) >= 658636 )4
2 ( target.Arch == "INTEL" ) 60
3 ( target.OpSys == "LINUX" ) 60
4 ( target.Disk >= 1 ) 60
5 ( TARGET.FileSystemDomain == "physics.smu.edu" )
60
---
3731.005: Run analysis summary. Of 60 machines,
56 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Mon Apr 9 16:00:38 2007
The Requirements expression for your job is:
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( ( 1024 * target.Memory ) >= 649132 )4
2 ( target.Arch == "INTEL" ) 60
3 ( target.OpSys == "LINUX" ) 60
4 ( target.Disk >= 1 ) 60
5 ( TARGET.FileSystemDomain == "physics.smu.edu" )
60
---
3731.006: Run analysis summary. Of 60 machines,
56 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Mon Apr 9 16:00:38 2007
The Requirements expression for your job is:
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( ( 1024 * target.Memory ) >= 644568 )4
2 ( target.Arch == "INTEL" ) 60
3 ( target.OpSys == "LINUX" ) 60
4 ( target.Disk >= 1 ) 60
5 ( TARGET.FileSystemDomain == "physics.smu.edu" )
60
---
3731.007: Run analysis summary. Of 60 machines,
56 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Mon Apr 9 16:00:38 2007
The Requirements expression for your job is:
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( ( 1024 * target.Memory ) >= 652028 )4
2 ( target.Arch == "INTEL" ) 60
3 ( target.OpSys == "LINUX" ) 60
4 ( target.Disk >= 1 ) 60
5 ( TARGET.FileSystemDomain == "physics.smu.edu" )
60
---
3731.008: Run analysis summary. Of 60 machines,
56 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Wed Mar 28 13:12:50 2007
Last failed match: Mon Apr 9 16:00:39 2007
Reason for last match failure: no match found
The Requirements expression for your job is:
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( ( 1024 * target.Memory ) >= 661720 )4
2 ( target.Arch == "INTEL" ) 60
3 ( target.OpSys == "LINUX" ) 60
4 ( target.Disk >= 1 ) 60
5 ( TARGET.FileSystemDomain == "physics.smu.edu" )
60
---
3731.009: Run analysis summary. Of 60 machines,
56 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
Last successful match: Wed Mar 28 15:07:04 2007
Last failed match: Fri Mar 30 17:34:08 2007
Reason for last match failure: no match found
The Requirements expression for your job is:
( target.Arch == "INTEL" ) && ( target.OpSys == "LINUX" ) &&
( target.Disk >= DiskUsage ) && ( ( target.Memory * 1024 ) >= ImageSize ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 ( ( 1024 * target.Memory ) >= 648188 )4
2 ( target.Arch == "INTEL" ) 60
3 ( target.OpSys == "LINUX" ) 60
4 ( target.Disk >= 1 ) 60
5 ( TARGET.FileSystemDomain == "physics.smu.edu" )
60
---
3731.014: Request is being serviced
---
3812.000: Request is being serviced