Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] HTCondor with Fscache
On 2025/11/21 5:02 am, gagan tiwari wrote:
Hi Guys,
          ÂAnyone has any ideas / adviceÂon this?ÂPlz letÂme know
Thanks,
Gagan
Hi Gagan,
Just my random thoughts ...
At ESAT we've been using fscache (NFS client side caching) since many years now here,
And indeed, the data 'transfer' (not in HTCondor sense) for the first job to access
larger data sets is significantly longer, as it primes the cache. So it would
make sense to improve the RANK of the compute node to attract similar jobs.
However, I've always seen this is unpractical, for several reasons:
- HTCondor has zero knowledge over what data a job using NFS really really
has read. One of the big disadvantages of using NFS...
- One could somehow collect mounts and try to make some sense out of it, but ...
It doesn't. A job immediately crashing might have mounted, but never primed
the cache. The cache has its own policy to get stuff out; the cache is
not (easily)Ârevealing exactly what files are in there... Some jobs mount
stuff from all over the place for just a small config file. Some jobs
access only fraction of bigger data sets, which means it doesn't end in the
cache at all...
- So the only way to infer this knowledge is using information from the job
description; that could be relatively easily done I think, but would
again be hit and mis...
In that case, I'd have the user specify some parameter in the JDF, like
Needs_NFS_XYZZY = True; The machine would then run such job and set
Has_seen_NFS_XYZZY = <seconds ago>. And then modify the RANK expression
so that machines with a low value for Has_seen_NFS_XYZZY attracts jobs
with Needs_NFS_XYZZY.
But again, at least in our situation, it's not worth the effort. Especially
as near-identical jobs would be 'pipelined' to certain machines anyhow.
But if you only deal with a very well known set of jobs, it might be different...
You could even make a dummy job just to prime the cache, if nothing else can interfere.
Chears, B.