Hi all, we are using the job_router to hook into the scheduling of jobs [1,2]. Basically, we have a database on file placement, and pick the best hosts for locality. In principle, this works flawlessly. However, many jobs do not actually benefit from this - some input files aren't present in our system, and some users just provide inadequate input lists. The ClassAd expressions for route `requirements` aren't sufficient to detect this. Only our hooks find out when trying the query. While this works, it means we get hundreds of routed jobs, each regularly calling update hooks, with no benefit at all. Either we let all those bogus routes persist or severely restrict also jobs that would profit. Either way, we've seen some massive load spikes and router or service performance degradation. Is there a way for *hooks* to end a route prematurely once it has been established? Our `translate` hooks can already detect if routing a job is useful, but we haven't found a way to tell this to the router. Hook failure is just logged as an error and retried. I've pondered having a hook remove its own routed job via `condor_rm`, but it seems rather hacky. Cheers, Max [1] HTCondor Hooks http://research.cs.wisc.edu/htcondor/manual/current/3_3Configuration.html#SECTION004333000000000000000 [2] Example configuration https://bitbucket.org/kitcmscomputing/hpda/src/61b129feaa9e94aab80fd4a989446783d762f0b2/docs/examples/htcondor/HPDA_Hook.cfg?at=master&fileviewer=file-view-default [3] Router and Router Hooks http://research.cs.wisc.edu/htcondor/manual/current/5_4HTCondor_Job.html#sec:JobRouter http://research.cs.wisc.edu/htcondor/manual/current/4_4Hooks.html#SECTION00542000000000000000
Attachment:
smime.p7s
Description: S/MIME cryptographic signature