Ian Stokes-Rees wrote:
FWIW, the jobs are generally managed through DAGMan, and have POST scripts associated with them. I would have thought that a queued job that gets "rm'ed" isn't going to go through the POST script, but I could be wrong.This is a case where you *don't* want to manually remove the node jobs. If you remove the node jobs, and DAGMan starts noticing those events before it gets removed itself, it *will* run POST scripts for the removed jobs, which might be part of your load problem.We just removed the DAGMan job, but we still got a load of 15000 (yes, 3 zeros). We're running 7.4.
How many instances of DAGMan were in the queue? O(10)? O(1000)? ... --Dan