[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Local storage on a condor computer node.



Is there a way for condor to take advantage of that? Mostly people are
using nfs for their data files so far.

Shared filesystems can very quickly become bottlenecks for clusters, so past a certain size, it's frequently necessary to copy jobs' input files to local storage (and their output files from local storage).

But is there a practical way to transfer very large data files through
condor and use them from local storage on the remote condor node?

Unfortunately, the answer really is "it depends." While Condor's file transfer is rather efficient (as a protocol), whether or not it makes sense for a job to access its input and output files via the local filesystem really depends on the properties of the job.

If the job only accesses (reads or writes) a small fraction of the data in its (large) files, it will be more efficient to access those files via NFS (or another shared filesystem that does block-level transfers).

If a job reads a substantial fraction of the data in its (large) input files -- and doesn't modify those files -- it will frequently be more efficient to access those files via some sort of horizontally-scaling caching system; we use squid as an HTTP cache here. This obviously works better if the (large) input files are used by multiple jobs.

For other (small) files, we generally recommend using HTCondor file transfer, as this is efficient enough and keeps load off of the shared filesystem that would be better used for files in the first case.

	For a user-facing explanation, see

https://chtc.cs.wisc.edu/uw-research-computing/file-availability.html

- ToddM

PS: in the preceding, I mentioned efficiency in a few different places,
which may be a little deceptive; what we care about is not absolute efficiency, but job throughput. For instance, it's usually easier to add more squid caches than it is to add more NFS servers; the caches may be less efficient in an absolute sense, but since you can add more caches, the overall throughput of the system goes up.