Hello,
We’re seeing the following error in SchedLog in about 1 in 10 jobs:
GET_JOB_CONNECT_INFO failed: Failed to get address of starter for this job
When this error occurs, the slot for the job is created successfully but
the user is never connected via ssh (no address to connect to).
About our pool:
All of the jobs on this particular pool are interactive. The hostnames
not fully qualified but DEFAULT_DOMAIN_NAME is set properly. It seems to
happen about 1 in 10 jobs all over the pool. Other interactive jobs
submitted just before or after this error (from the same user and
matched against the same host) work perfectly. We turned on full
debugging but nothing useful was logged.
What are we missing?
Any help is much appreciated.
-Michelle