[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Swap space estimate reached! No more jobs can berun! Solution: get more swap space, or set RESERVED_SWAP = 0



Many thanks for your reply.

The cluster machine are 64 bit processors. condor shows them as x86_64 as the architecture.

condor_config_val reports RESERVED_SWAP as 0 on all my nodes including the central manager.

I will try upgrade to 6.7 just now.

many thanks

Chros
----- Original Message ----- From: "Erik Paulson" <epaulson@xxxxxxxxxxx>
To: "Condor-Users Mail List" <condor-users@xxxxxxxxxxx>
Sent: Tuesday, September 27, 2005 6:58 PM
Subject: Re: [Condor-users] Swap space estimate reached! No more jobs can berun! Solution: get more swap space, or set RESERVED_SWAP = 0



On Tue, Sep 27, 2005 at 05:32:57PM +0100, Chris Miles wrote:
My ScheddLog is reporting

9/26 15:13:49 Sent ad to central manager for chris@xxxxxxxxxxx
9/26 15:13:49 Called reschedule_negotiator()
9/26 15:13:55 Activity on stashed negotiator socket
9/26 15:13:55 Negotiating for owner: chris@xxxxxxxxxxx
9/26 15:13:55 Checking consistency running and runnable jobs
9/26 15:13:55 Tables are consistent
9/26 15:13:55 Swap space estimate reached! No more jobs can be run!
9/26 15:13:55     Solution: get more swap space, or set RESERVED_SWAP = 0
9/26 15:13:55     0 jobs matched, 5 jobs idle

RESERVED_SWAP has been set to 0 in the global condor_config file
which has not made any difference.  The daemons have all been reinstalled
after that.

I am using a Linux cluster with suse 9 installed. the cluster nodes contain no swap space.

I can not find any information on this or any help on it.

Any help is greatly appreciated.


Upgrade to 6.7. It works with the 2.6 kernel much better. Condor 6.6 can't detect the memory or the swap space on a 2.6 kernel machine, and you can only define MEMORY in the config file, not SWAP.

However, on linux/x86, that's not been a problem, because the bogus answer
we get back from the kernel is thankfully close enough, so it's just kind
of worked. Where we've seen problems with SWAP and the 2.6 kernel is on
other Linux platforms, like Linux/PPC. And then the problem is usually
that the Shadow won't start. If RESERVED_SWAP is set to 0, the
schedd shouldn't complain. Since you say that it's set in the global config
file, my guess is that it's being overrideden somewhere. Check with
condor_config_val and ask the schedd directly what it's set to:


cobalt(2)% condor_config_val -pool condor.cs.wisc.edu -schedd -name south.cs.wisc.edu reserved_swap -verbose
reserved_swap: 35
Defined in '/unsup/condor/etc/condor_config.global', line 82.


or set the SCHEDD_DEBUG to include D_FULLDEBUG and restart the schedd, it
will print out all of the values for swap space sizes it's using.

-Erik


thanks

Chris

_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
_______________________________________________
Condor-users mailing list
Condor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/condor-users