Dear List,
We are using the Open Science Grid extensively and would like to improve the cache hit rate with squid.
We typically spawn 3000 jobs at a time, i.e. over about a 15 – 30 minute period, with a new job entering the run state typically every 1-3 sec.
We are trying to use Condor’s squid file caching capability by sending a common data staging file to the execute nodes via http.
The file is typically 12 -35 Mbytes and is the same for all 3000 jobs.
The apache log (http server) on the machine from which the files are staged typically shows 1400-2000 fetches of the file, suggesting that our hit rate is only
about 30-40%. This is in spite of the fact that our jobs are typically executing on only 10 – 25 grid facilities with presumably only 1 http_proxy per facility.
We are wondering if our file is getting flushed from the cache very quickly so that refetches are required and if there is any way to control that. A related
question regards how the flushing mechanism works: Is there something in place which takes account of repeated uses of the same file to improve it’s persistence in the cache? Or does the flushing mechanism depend perhaps only on the size of the file and the
time since it was fetched? Thanks for any insight and/or suggestions you can provide. And if this list is the wrong place to pose this, I would be happy to be redirected. It struck me in thinking about this though that there might be subscribers to the
list who are in a good position to tinker with this and who might know it very well.
Thanks,
Don
Don Krieger, Ph.D.
Department of Neurological Surgery
University
of Pittsburgh