I'm new in Condor installing and configuring so I don't now if
there is some
else to do to avoid this behaviour:
I installed the Condor 6.8.1 on a cluster build by 9 P4@xxxxxx with
1 master
host and 9 working nodes (basically on a Beowulf system) with
Fedora Core 4
Linux. All nodes share /home and /opt. I followed the install
procedure
choosing a full installation on the master node configuring it as
the condor
control manager. All daemons on the controller starts up correctly.
In the CONTROL_CONFIG file I choose:
LOCAL_DIR = /home/condor/hosts/$(HOSTNAME)
LOCAL_CONFIG_FILE = $(RELEASE_DIR)/etc/$(HOSTNAME).local
REQUIRE_LOCAL_CONFIG_FILE = FALSE
HOSTALLOW_WRITE = *
Then I configured each working node defining the
CONDOR_HOME=/opt/condor-6.8.1, the
CODOR_CONFIG=/opt/condor-6.8.1/etc/condor_config. Condor starts up
correctly
on each working node. The condor_status command shows all machines
in the
pool:
Name OpSys Arch State Activity LoadAv Mem
ActvtyTime
vm1@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 1012
0+20:51:56
vm2@xxxxxxxxx LINUX INTEL Owner Idle 0.070 1012
0+00:10:05
vm1@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+03:05:04
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+20:50:50
vm1@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+03:05:04
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+20:50:53
vm1@xxxxxxxxx LINUX INTEL Owner Idle 1.000 250
0+22:20:50
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+02:50:08
vm1@xxxxxxxxx LINUX INTEL Owner Idle 1.000 250
0+22:20:42
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+02:55:05
vm1@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+03:05:06
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+20:50:54
vm1@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 504
0+03:05:04
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 504
0+20:50:55
vm1@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+03:05:05
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+20:50:53
vm1@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.010 250
0+03:05:05
vm2@xxxxxxxxx LINUX INTEL Unclaimed Idle 0.000 250
0+20:50:54
Total Owner Claimed Unclaimed Matched Preempting
Backfill
INTEL/LINUX 18 3 0 15 0 0
0
Total 18 3 0 15 0 0
0
It's appear working correctly, but if I submit a using the
following script
with the command condor_submit -a "log = out.log" -a "error =
error.log"
ex02.submit:
Executable = /bin/hostname
Universe = vanilla
Requirements = OpSys == "LINUX" && Arch =="INTEL"
Error = err.$(Process)
Output = out.$(Process)
Log = foo.log
Queue 50
The jobs are queued but executed only on the submitting machine.
I tried with more jobs, for example 500, with all machines
unclaimed, but
nothing! If I submit from the master node all jobs are executed on the
master node, if I submit from the node01 all jobs are executed on
the node01
and so on.
What is wrong?