The aforementioned VMs are running Ubuntu(Precise64). For each VM I installed HTCondor from Debian repositories. Both VMs are able to run their own HTCondor jobs.
I submit a job from master01 and checking the NegotiatorLog (@master02) I found this
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) CONNECT bound to <
10.0.2.15:39406> fd=8 peer=<
192.168.253.2:33532>
08/12/14 16:21:02 (fd:9) (pid:877) (D_SECURITY) SECMAN: command 416 NEGOTIATE to schedd
vagrant@xxxxxxxxxxxxxxxxxxx from TCP port 39406 (blocking).
08/12/14 16:21:02 (fd:9) (pid:877) (D_SECURITY) SECMAN: using session master01:994:1407860162:6 for {<
192.168.253.2:33532>,<416>}.
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_write(fd=8 schedd
vagrant@xxxxxxxxxxxxxxxxxxx,,size=625,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_SECURITY) SECMAN: startCommand succeeded.
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) Destroying Daemon object:
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) Type: 3 (schedd), Name:
vagrant@xxxxxxxxxxxxxxxxxxx, Addr: <
192.168.253.2:33532>
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) FullHost:
master01.demo01.org, Host: master01, Pool: (null), Port: -1
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME) IsLocal: N, IdStr: schedd
vagrant@xxxxxxxxxxxxxxxxxxx, Error: (null)
08/12/14 16:21:02 (fd:9) (pid:877) (D_HOSTNAME)Â --- End of Daemon object info ---
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_write(fd=8 schedd
vagrant@xxxxxxxxxxxxxxxxxxx,,size=155,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_write(fd=8 schedd
vagrant@xxxxxxxxxxxxxxxxxxx,,size=13,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) condor_read(fd=8 schedd
vagrant@xxxxxxxxxxxxxxxxxxx,,size=5,timeout=30,flags=0,non_blocking=0)
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) Stream::get(int) failed to read padding
08/12/14 16:21:02 (fd:9) (pid:877) (D_ALWAYS)ÂÂÂÂ Failed to get reply from schedd
08/12/14 16:21:02 (fd:9) (pid:877) (D_NETWORK) CLOSE <
10.0.2.15:39406> fd=8
08/12/14 16:21:02 (fd:8) (pid:877) (D_ALWAYS)ÂÂ Error: Ignoring submitter for this cycle
08/12/14 16:21:02 (fd:8) (pid:877) (D_ALWAYS)Â negotiateWithGroup resources used scheddAds length 0
08/12/14 16:21:02 (fd:8) (pid:877) (D_ALWAYS) ---------- Finished Negotiation Cycle ----------
As you can see a connection with the NAT interface (10.0.2.15) was created and that situation is causing the communication network problems.
How can I fix that? How can I force to HTCondor to use the eth1 and to forget the eth0 interface in this particular case?Â
Thank you very, very much for your help.
PS: I and other bunch of people are working in a national initiative (in Colombia) willing to share computational clusters between different institutions of our country using HTCondor.