Hi Matt (and all),
Thanks for the response, it totally pointed me in the right
direction, which was the filesystem. As it's shared, I had to
change the UID_DOMAIN and FILESYSTEM_DOMAIN configuration
parameters, and it all worked.
Well, almost. I've three computers in my pool now, one host and two
submit/execute machines. If I submit jobs from either of the
non-host computers, they get farmed out across all three, and all is
dandy.
However, when I submit jobs from the host, they get farmed out, and
only those on the NON-host machines actually run. The others get
held with this message:
user@HOST:~/condor_test$ condor_q -analyze
-- Submitter: HOST :<127.0.1.1:35783> : HOST
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
---
012.003: Request is held.
Hold reason: Error from starter on slot1@HOST: Failed to open
'/net/home/user/condor_test/simple.3.out' as standard output:
Permission denied (errno 13)
I can resolve this by making those files world-writable, but doesn't
seem correct. Thoughts?
Also, I'm using 7.2.4 because it's what came down via apt-get. I'll
look into upgrading.
Thanks again,
Dan
On 11/08/2011 10:21 PM, Matthew Farrellee wrote:
On 11/08/2011 06:27 PM, Daniel Grollman wrote:
Hello Condor-users,
Is there a quick start guide for getting condor up and running on a
small ubuntu 10.04 pool? I just want to run processes on other machine's
idle processors (vanilla universe).
Here's where I'm at if anyone can help:
2 identical (virtual) machines with fresh installs of Ubuntu 10.04 with
Condor 7.2.4 installed via 'apt-get install condor'
At this point both machines have their own local condors, and I can
queue and run jobs, no problem.
I edited the /etc/condor/condor_config files thusly:
On machine 1:
CONDOR_HOST = [IP address of machine 2]
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *
On machine 2:
HOSTALLOW_READ = *
HOSTALLOW_WRITE = *
After a reboot (?) condor_status on either machine shows me the slots on
both machines and if they're busy/idle/etc (yay!). However, they still
seem to have different queues. I.e, when I submit from machine 1, I only
see it in condor_q on machine 1, and it only runs on the cpu of machine
1 (but I see the usage in condor_status on machine 2).
I imagine there's a configuration parameter I need to set somewhere, but
I don't know what. Help please?
Thanks,
Dan
You probably want ShouldTransferFiles = IF_NEEDED& WhenToTransferOutput
= ON_EXIT in your submit file.
https://condor-wiki.cs.wisc.edu/index.cgi/tktview?tn=2281
7.2.4 is very old at this point, can you upgrade?
Here are some instructions you can follow, they're for Fedora, but if
you pretend apt is yum and, with 7.2.4, you throw everything in
~condor/condor_config.local instead of /etc/condor/config.d, everything
should work.
http://spinningmatt.wordpress.com/2011/06/12/getting-started-creating-a-multiple-node-condor-pool/
http://spinningmatt.wordpress.com/2011/06/21/getting-started-multiple-node-condor-pool-with-firewalls/
http://spinningmatt.wordpress.com/2011/07/04/getting-started-submitting-jobs-to-condor/
Best,
matt
--
Dan Grollman
Robot Doctor
daniel.grollman@xxxxxxxxx
http://www.vecna.com/robotics
Cambridge Research Laboratory
Vecna Technologies, Inc.
36 Cambridge Park Drive
Cambridge, MA 02140
Phone: (617) 864-0636
Fax: (617) 864-0638
Better Technology, Better World (TM)