Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Jobs are executed only on the submitting machines

Date: Mon, 2 Oct 2006 11:14:24 -0500
From: Jaime Frey <jfrey@xxxxxxxxxxx>
Subject: Re: [Condor-users] Jobs are executed only on the submitting machines

On Oct 2, 2006, at 3:10 AM, Dr. Raffaele Montella wrote:

I'm new in Condor installing and configuring so I don't now ifthere is some

else to do to avoid this behaviour:

I installed the Condor 6.8.1 on a cluster build by 9 P4@xxxxxx with1 masterhost and 9 working nodes (basically on a Beowulf system) withFedora Core 4Linux. All nodes share /home and /opt. I followed the installprocedurechoosing a full installation on the master node configuring it asthe condor

control manager. All daemons on the controller starts up correctly.
In the CONTROL_CONFIG file I choose:
LOCAL_DIR               = /home/condor/hosts/$(HOSTNAME)
LOCAL_CONFIG_FILE       = $(RELEASE_DIR)/etc/$(HOSTNAME).local
REQUIRE_LOCAL_CONFIG_FILE = FALSE
HOSTALLOW_WRITE = *

Then I configured each working node defining the
CONDOR_HOME=/opt/condor-6.8.1, the

CODOR_CONFIG=/opt/condor-6.8.1/etc/condor_config. Condor starts upcorrectlyon each working node. The condor_status command shows all machinesin the

pool:

Name          OpSys       Arch   State      Activity   LoadAv Mem
ActvtyTime

vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000  1012
0+20:51:56
vm2@xxxxxxxxx LINUX       INTEL  Owner      Idle       0.070  1012
0+00:10:05
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+03:05:04
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:50
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+03:05:04
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:53
vm1@xxxxxxxxx LINUX       INTEL  Owner      Idle       1.000   250
0+22:20:50
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+02:50:08
vm1@xxxxxxxxx LINUX       INTEL  Owner      Idle       1.000   250
0+22:20:42
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+02:55:05
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+03:05:06
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:54
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   504
0+03:05:04
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   504
0+20:50:55
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+03:05:05
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:53
vm1@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.010   250
0+03:05:05
vm2@xxxxxxxxx LINUX       INTEL  Unclaimed  Idle       0.000   250
0+20:50:54

                     Total Owner Claimed Unclaimed Matched Preempting
Backfill

         INTEL/LINUX    18     3       0        15       0          0
0

               Total    18     3       0        15       0          0
0

It's appear working correctly, but if I submit a using thefollowing scriptwith the command condor_submit -a "log = out.log" -a "error =error.log"

ex02.submit:

Executable     = /bin/hostname
Universe       = vanilla
Requirements   = OpSys == "LINUX" && Arch =="INTEL"
             Error   = err.$(Process)
             Output  = out.$(Process)
             Log = foo.log

Queue 50

The jobs are queued but executed only on the submitting machine.

I tried with more jobs, for example 500, with all machinesunclaimed, but

nothing! If I submit from the master node all jobs are executed on the

master node, if I submit from the node01 all jobs are executed onthe node01

and so on.

What is wrong?

This is probably a shared filesystem problem. On unix, if you don'tsay otherwise, Condor assumes your job's data files are on a sharedfilesystem. If your machines don't have a share filesystem, thenCondor will only run the job on the submit machine.

You can tell Condor to not rely on a shared filesystem and transferthe job's file itself by including the follow in your submit file:

should_transfer_files = YES
when_to_transfer_output = ON_EXIT

Then you can also use transfer_input_files to say what input filesneed to be transferred in addition to the executable.


+--------------------------------+-----------------------------------+
|           Jaime Frey           | I used to be a heavy gambler.     |
|       jfrey@xxxxxxxxxxx        | But now I just make mental bets.  |
| http://www.cs.wisc.edu/~jfrey/ | That's how I lost my mind.        |
+--------------------------------+-----------------------------------+

Follow-Ups:
- [Condor-users] R: Jobs are executed only on the submitting machines
  - From: Dr. Raffaele Montella

References:
- [Condor-users] Jobs are executed only on the submitting machines
  - From: Dr. Raffaele Montella

Prev by Date: Re: [Condor-users] Job evicted
Next by Date: [Condor-users] IN_HIGHPORT/IN_LOWPORT being ignored
Previous by thread: [Condor-users] Jobs are executed only on the submitting machines
Next by thread: [Condor-users] R: Jobs are executed only on the submitting machines
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [Condor-users] Jobs are executed only on the submitting machines