Hello all, we have about 2000 VM workernodes ( ~ 8000 cores ) which are
behind a NAT. We start up to 10 VMs every 30 sec. Sometimes we got
problems with the CCB CCBClient: Failed to read response from CCB server collector... Failed to reverse connect to startd workernode via CCB. Also the Collector, Negotiator and Scheduler get up to a daemon
load of 100% and condor_q /condor_status became slow. However the
machines has free resources in memory and CPU. The Collector,
Negotiator and Scheduler run Condor version 8.4.8/9 and the
workernodes version 8.5.7 The network between the VMs and the Collector looks stable. Our
plan is to start additional Collectors with CCBs. Would that help?
How much Collectors do we need and how we should configure our
system? Thanks and best regards, Matthias
Thank you for any help you can provide.
Thank you for any help you can provide.
Thank you for any help you can provide.
Thank you for any help you can provide.
Thank you for any help you can provide.
|