[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] mixed pool: NFS and non-NFS directories




Alain Roy wrote:

On Mar 21, 2008, at 2:59 PM, Ian Stokes-Rees wrote:
I am running OSG on a small (20 node) Condor pool with NFS and shared home directories for VOs. I am interested in finding out if it is possible and practical to add into the pool other execute nodes which don't have shared NFS or user home directories. Can anyone offer any tips or suggestions regarding this?
Condor is happy to run without a shared filesystem. If a user doesn't  
request file transfers for a job but, it is assumed that the files are  
on a shared filesystem. Condor detects if you share a filesystem by  
looking at FILESYSTEM_DOMAIN. Details are in the manual: I can point  
you to specifics if you need them. If a user specifics that files  
should be transferred "if needed", then FILESYSTEM_DOMAIN is used to  
decide if you are on the shared filesystem, or if the files need to be  
transmitted.
Out of the box, the OSG Globus installation assumes that Condor is  
using a shared filesystem, and when it submits jobs to Condor it  
doesn't tell Condor to transfer files. There is an alternate Condor  
job manager that does tell Condor to transfer files. Some VDT-specific  
documentation is at:
http://vdt.cs.wisc.edu/releases/1.8.1/notes/Globus-CondorNFSLite-Setup.html

I don't think it is a documented solution in OSG, but several sites are using it with success. You'll need to install it from the VDT cache instead of the OSG cache.
Note that the NFSLite configuration does not remove all dependence on a 
shared filesystem in OSG.  Typical OSG jobs rely on software 
pre-installed by the grid user in $OSG_APP, which is assumed to be 
readable from all of the worker nodes.  However, if you only wish to 
support specific OSG jobs that you know do not depend on $OSG_APP, then 
you can get by without it.  I suppose you could also rsync it to a local 
disk on the worker nodes at the risk of a temporary condition in which 
new software has been installed but is not yet available from some 
nodes.  The shared writable $OSG_DATA cannot so easily be faked without 
a shared filesystem, but it is rarely used by OSG jobs, in my 
experience.  Also note that the worker node client $OSG_GRID is assumed 
to be accessible from all worker nodes.  Since this is just a one-time 
installation, it can simply be installed locally on the worker nodes 
without any problem.
--Dan