Dear Colleagues, Thank you for responses. Actually, the goal Ivan and I are trying to achieve is the following. Possibly you can help us to find a proper HT Condor based solution. We run parallel universe job to be sure all processes are running at the same time (to guarantee there are not deadlocks when many such jobs run). Then we need to establish connections between all worker processes and the main one. The difficulty here is that all running processes (both main and workers) bind to an arbitrary ports and we need some discovery mechanism to let them find each other. So the idea was to publish main process endpoint somewhere (that is where we thought about HT Condor as a key-value storage) and let workers request this endpoint and check-in to the main process. May be you can advise something here. Thanks in advance.
|