Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Negotiation Cycle between Linux master and WindowsXP pool
- Date: Thu, 12 Feb 2009 21:32:25 -0800 (PST)
- From: Rob <spamrefuse@xxxxxxxxx>
- Subject: [Condor-users] Negotiation Cycle between Linux master and WindowsXP pool
Hello,
In order to learn the condor configuration, I have set up a a mini-condor pool in my office
with an Intel/Linux PC as the central master, and a single Intel/WindowsXP PC in the pool.
The Linux / Windows PCs have IPs 125.125.120.72, 125.125.120.71, respectively
I have installed Condor on Windows XP, in the recommended "UWCS" configuration.
On Linux, Condor comes from the precompiled rpm package provided by the yum repository.
In the local config file, I have configured the Windows pool PC to always run condor jobs.
When I submit a job, I expected it would run right away, but it doesn't. See details below.
I'm not sure why the job is not ran; in the NegotiatorLog file there is the 127.0.0.1 IP numbers
as the IP of the submitter. Is that causing the trouble?
Do I have to add 127.0.0.1 somewhere in the local config files?
I hope someone can point out where I should look for solving this problem!
Thanks!
The local configuration files on the two machines are:
# Linux master:
CONDOR_DEVELOPERS = NONE
COLLECTOR_NAME = Library Pool
COLLECTOR_HOST = $(FULL_HOSTNAME)
DAEMON_LIST = COLLECTOR, MASTER, NEGOTIATOR, SCHEDD
NEGOTIATOR_INTERVAL = 20
TRUST_UID_DOMAIN = TRUE
HOSTALLOW_WRITE = *
HOSTALLOW_READ = *
LOWPORT = 9600
HIGHPORT = 9700
# Windows pool PC
COLLECTOR_NAME = Library Pool
HOSTALLOW_WRITE = *
HOSTALLOW_READ = *
DAEMON_LIST = MASTER STARTD
HOSTALLOW_ADMINISTRATOR = 125.125.120.72
CONSOLE_DEVICES = mouse, console
LOWPORT = 9600
HIGHPORT = 9700
WANT_SUSPEND = TRUE
WANT_VACATE = FALSE
START = TRUE
SUSPEND = FALSE
PREEMPT = FALSE
On the master I have 5 condor daemons:
condor_master
condor_collector
condor_negotiator
condor_schedd
condor_procd
On the Windows pool PC, there are two condor daemons:
condor_master.exe
condor_startd.exe
I get status output on the master:
$ condor_status
Name OpSys Arch State Activity LoadAv Mem ActvtyTime
Office WINNT51 INTEL Unclaimed Idle 0.020 767 0+00:17:53
I made this submit job:
#########
Requirements = (Arch == "INTEL") && (OpSys == "WINNT51") && (HasFileTransfer)
Universe = vanilla
Executable = helloworld.exe
output = helloworld.out
should_transfer_files = YES
when_to_transfer_output = ON_EXIT
Queue
#########
$ condor_q
-- Submitter: localhost.localdomain : <127.0.0.1:9623> : localhost.localdomain
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
2.0 greg 2/13 11:11 0+00:00:00 I 0 0.0 helloworld.exe
1 jobs; 1 idle, 0 running, 0 held
$ condor_q -analyze 2.0
---
002.000: Run analysis summary. Of 4 machines,
0 are rejected by your job's requirements
0 reject your job because of their own requirements
0 match but are serving users with a better priority in the pool
4 match but reject the job for unknown reasons
0 match but will not currently preempt their existing job
0 are available to run your job
In the NegotiatorLog file the repetitive negotiation cycles complain about some kind
of read/write error, when negotiating with 127.0.0.1...:
2/13 14:11:09 ---------- Started Negotiation Cycle ----------
2/13 14:11:09 Phase 1: Obtaining ads from collector ...
2/13 14:11:09 Getting all public ads ...
2/13 14:11:09 Sorting 10 ads ...
2/13 14:11:09 Getting startd private ads ...
2/13 14:11:09 Got ads: 10 public and 4 private
2/13 14:11:09 Public ads include 1 submitter, 4 startd
2/13 14:11:09 Phase 2: Performing accounting ...
2/13 14:11:09 Phase 3: Sorting submitter ads by priority ...
2/13 14:11:09 Phase 4.1: Negotiating with schedds ...
2/13 14:11:09 Negotiating with lahaye@xxxxxxxxxxxxxxxxxxxxx at <127.0.0.1:9623>
2/13 14:11:09 0 seconds so far
2/13 14:11:09 condor_read(): recv() returned -1, errno = 104, assuming failure reading 5 bytes from unknown source.
2/13 14:11:09 IO: Failed to read packet header
2/13 14:11:09 Failed to get reply from schedd
2/13 14:11:09 Error: Ignoring schedd for this cycle
2/13 14:11:09 ---------- Finished Negotiation Cycle ----------
---
Rob.