Hi there!
For our farm we do something similar for floating licenses (e.g. flexLM or sesi for Houdini) in that we have an external process polling license servers. In our studio, licenses can be used both on the farm and off the farm where condor can't track it so it's a little more involved that just parsing for the total available license, but basically we come up with a number of how many licenses are either in use or available on the farm and write that to a condor config file on the negotiator host.
The entries in the file (named something like 99_license_limits) looks something like:
nuke_LIMIT = 1000
maya_fluid_sim_LIMIT = 200
These basically set up
concurrency limits for our licenses. Jobs that will need to use a particular license specify them in their submission description files with a line like:
ConcurrencyLimit = nuke
When licenses get used outside of the farm, we adjust the values written to the 99_license_limit file. For example, if we know that 20 of our maya_fluid_sim licenses are being used outside of HTCondor, we update the config file with:
maya_fluid_sim_LIMIT = 180
There's a configuration parameter called
NEGOTIATOR_READ_CONFIG_BEFORE_CYCLE that makes the negotiator reread the configuration files before each negotiation cycle so it will have the latest (for some definition of "latest") license limit values before doing any match-making.
This may be overkill for you license situation, but it seems like this could probably be used for your file server throttling. We need something similar for throttling our NFS servers.
Create a limiter for each filer like:
volume_1_LIMIT = 99999
volume_2_LIMIT = 99999
Under normal circumstances, the value is set to a number higher than the total number of job slots on your farm. When your external script detects that the filer is at capacity or otherwise overloaded, update the values to 1 (I don't remember if 0 is a valid value or not). This prevents any new jobs requiring the filer limits from starting.
Full disclosure, however, we didn't use this for very long because most users had no idea what filers their jobs would access at run time, but maybe you'll have better luck.
In any case, it sounds like you've already got an alternate solution, but just wanted to share what we did for a similar problem.
Cheers!