Subject: [Condor-users] Problem getting condor up and running
HI All,
I am having some trouble getting
condor up and running on a couple of machines that are running rhel 4 with
kernel 2.6.9-22.0.1.ELsmp, they have 8 cpus each and 16 Gb of memory
each. Due to some limitation with 2.6 kernels i had to define the
total amount of memory in my local config files. I also added a (RESERVED_SWAP
= 0) as per a suggestion in a log file when i first started my run attempts.
I added the LOCK stmt to give each machine a local lock file to work
with since all the other files are on NFS:
Successfully matched with vm2@xxxxxxxxxxxxxxxxxxxxxxxxxx
The Negotation Cycle:
6/26 09:46:32 ---------- Started Negotiation
Cycle ----------
6/26 09:46:32 Phase 1: Obtaining
ads from collector ...
6/26 09:46:32 Getting all public
ads ...
6/26 09:46:32 Sorting 24 ads
...
6/26 09:46:32 Getting startd
private ads ...
6/26 09:46:32 Got ads: 24 public and
19 private
6/26 09:46:32 Public ads include 1 submitter,
19 startd
6/26 09:46:32 Phase 2: Performing
accounting ...
6/26 09:46:32 Phase 3: Sorting
submitter ads by priority ...
6/26 09:46:32 Phase 4.1: Negotiating
with schedds ...
6/26 09:46:32 Negotiating with
asgdev@mydomainname at <x.x.x.x:40628>
6/26 09:46:32 Request
00008.00000:
6/26 09:46:32 Matched
8.0 asgdev@xxxxxxxxxxxxxxxxx <x.x.x.x:40628> preempting none <x.x.x.x:36098>
6/26 09:46:32 Successfully
matched with vm3@xxxxxxxxxxxxxxxxxxxxxxxxxx
6/26 09:46:32 Got NO_MORE_JOBS;
done negotiating
6/26 09:46:32 ---------- Finished Negotiation
Cycle ----------
But the job goes from running to idle
in a fraction of a second and then just sits there. Below are some
of the relevent output. I have put 'mydomainname' in place of my
real domain and x'ed out my ip's. Does anyone have a similiar issue,
ideas to try? -Ali
In ShadowLog:
6/26 09:41:34 Using local config files:
/public/murex/home/asgdev/condor/etc/snycmfnedad24.local
6/26 09:41:34 DaemonCore: Command Socket
at <x.x.x.x:40656>
6/26 09:41:35 Initializing a VANILLA
shadow
6/26 09:41:35 (8.0) (2880): Not enough
reserved swap space
6/26 09:41:35 (8.0) (2880): **** condor_shadow
(condor_SHADOW) EXITING WITH STATUS 105
condor_q -analyze:
-- Submitter: snycmfnedad24.mydomainname
: <x.x.x.x:40628> : snycmfnedad24.mydomainname
ID OWNER
SUBMITTED RUN_TIME ST PRI
SIZE CMD
---
008.000: Run analysis summary.
Of 19 machines,
3 are rejected
by your job's requirements
8 reject your job
because of their own requirements
2 match, but are
serving users with a better priority in the pool
6 match, match,
but reject the job for unknown reasons
0 match, but will
not currently preempt their existing job
0 are available
to run your job
Last successful
match: Mon Jun 26 09:41:32 2006