Running Condor 8.2.8 here, and am experiencing a lack of responsiveness when submitting jobs (this is mostly unusual), or running ‘condor_q’ or ‘condor_submit -debug’. ‘condor_q’ does return after several minutes in some cases; in others
it throws an error: -- failed to fetch ads from: <our_scheduler_node_IP_address:51430> : <fqdn_of_same_scheduler_node> This issue presumably started over the weekend when someone submitted a larger set of jobs (order of magnitude = 10x) than “usual.” When ‘condor_q’ does finish, at the end the summary shows the following: 31823 jobs; 0 completed, 31786 removed, 19 idle, 12 running, 24 held, 0 suspended I’m posting to see if anyone has insight into how to diagnose why the jobs aren’t running. I believe the amount (>33k jobs submitted over three days) isn’t unprecedented. Obviously I’m not a Condor subject-matter expert here, but am trying
to grow into something close, by hook or by crook. Thanks for any and all insights! Eric |