I'm getting a lot of timeouts from our schedd machines if I call them
directly with condor_q and I have a sneaky suspicion that it's due to a
fair number of users of our system calling condor_q -global in scripts
that parse the output to display overall system state.
Is it possible for condor_q -global to stress schedulers to the point
where calls for queue status start getting dropped? I'm even seeing just
"condor_q" on the larger schedd machines (machines with 1000+ jobs
queued) issue "failed to fetch ads" messages. This is all with 6.7.3 on
a mix of Windows and Linux machines.
Is there a recommended (non-stressful) way for me to guide my users
towards so they can see who's has what running, queued and held in the
system and what the JobPrio of those jobs are?
- Ian
--
Ian R. Chesal <ichesal@xxxxxxxxxx>
Senior Software Engineer
Altera Corporation
Toronto Technology Center
Tel: (416) 926-8300
|