Hi John,I haveÂmanaged to get it all working now and will document what i've done shortly but you are correct, our /etc/hosts files were incorrect for condor's heuristicÂmethod to determine hostname and domain names. That was the root cause of both problems. I found this article which pointed me along your line of thinking and corrected our error https://spinningmatt.wordpress.com/2010/07/28/how-condor-determines-a-nodes-ip-and-hostname/.To try answer your question, it appears condor recognizes both the loopback and external ip and that was why it was able to communicate (see below).Cheers, and thanks again!Lyle--- output of condor heuristic, note this is using a good /etc/hosts Âlyle@tuna:~$ env _CONDOR_TOOL_DEBUG=D_HOSTNAME condor_config_val -debug FULL_HOSTNAME
08/03/21 05:37:52 NETWORK_INTERFACE=* matches lo 127.0.0.1, enp0s31f6 192.168.7.144, docker_gwbridge 172.18.0.1, docker0 172.17.0.1, lo ::1, enp0s31f6 fe80::c8f7:3616:850e:e934, docker_gwbridge fe80::42:51ff:fea1:e70, docker0 fe80::42:9ff:fec0:5dff, veth208f15f fe80::68b3:40ff:fefe:dd88, vetha4a8c59 fe80::e:80ff:fef7:62bd, vethcf3513d fe80::7c8f:7bff:fe23:4af2, choosing IP 192.168.7.144
08/03/21 05:37:52 DNS returned:
08/03/21 05:37:52 127.0.1.1
08/03/21 05:37:52 192.168.7.144
08/03/21 05:37:52 We returned:
08/03/21 05:37:52 127.0.1.1
08/03/21 05:37:52 192.168.7.144
08/03/21 05:37:52 hostname: tuna.ocwen.com
08/03/21 05:37:52 I am: hostname: tuna, fully qualified doman name: tuna.ocwen.com, IP: 192.168.7.144, IPv4: 192.168.7.144, IPv6:
08/03/21 05:37:52 Trying to getting network interface information after reading config
08/03/21 05:37:52 NETWORK_INTERFACE=* matches lo 127.0.0.1, enp0s31f6 192.168.7.144, docker_gwbridge 172.18.0.1, docker0 172.17.0.1, lo ::1, enp0s31f6 fe80::c8f7:3616:850e:e934, docker_gwbridge fe80::42:51ff:fea1:e70, docker0 fe80::42:9ff:fec0:5dff, veth208f15f fe80::68b3:40ff:fefe:dd88, vetha4a8c59 fe80::e:80ff:fef7:62bd, vethcf3513d fe80::7c8f:7bff:fe23:4af2, choosing IP 192.168.7.144
08/03/21 05:37:52 NETWORK_INTERFACE=* matches lo 127.0.0.1, enp0s31f6 192.168.7.144, docker_gwbridge 172.18.0.1, docker0 172.17.0.1, lo ::1, enp0s31f6 fe80::c8f7:3616:850e:e934, docker_gwbridge fe80::42:51ff:fea1:e70, docker0 fe80::42:9ff:fec0:5dff, veth208f15f fe80::68b3:40ff:fefe:dd88, vetha4a8c59 fe80::e:80ff:fef7:62bd, vethcf3513d fe80::7c8f:7bff:fe23:4af2, choosing IP 192.168.7.144
08/03/21 05:37:52 DNS returned:
08/03/21 05:37:52 127.0.1.1
08/03/21 05:37:52 192.168.7.144
08/03/21 05:37:52 We returned:
08/03/21 05:37:52 127.0.1.1
08/03/21 05:37:52 192.168.7.144
08/03/21 05:37:52 hostname: tuna.ocwen.com
08/03/21 05:37:52 I am: hostname: tuna, fully qualified doman name: tuna.ocwen.com, IP: 192.168.7.144, IPv4: 192.168.7.144, IPv6:
08/03/21 05:37:52 NETWORK_INTERFACE=* matches lo 127.0.0.1, enp0s31f6 192.168.7.144, docker_gwbridge 172.18.0.1, docker0 172.17.0.1, lo ::1, enp0s31f6 fe80::c8f7:3616:850e:e934, docker_gwbridge fe80::42:51ff:fea1:e70, docker0 fe80::42:9ff:fec0:5dff, veth208f15f fe80::68b3:40ff:fefe:dd88, vetha4a8c59 fe80::e:80ff:fef7:62bd, vethcf3513d fe80::7c8f:7bff:fe23:4af2, choosing IP 192.168.7.144
08/03/21 05:37:52 DNS returned:
08/03/21 05:37:52 127.0.1.1
08/03/21 05:37:52 192.168.7.144
08/03/21 05:37:52 We returned:
08/03/21 05:37:52 127.0.1.1
08/03/21 05:37:52 192.168.7.144
08/03/21 05:37:52 hostname: tuna.ocwen.com
08/03/21 05:37:52 I am: hostname: tuna, fully qualified doman name: tuna.ocwen.com, IP: 192.168.7.144, IPv4: 192.168.7.144, IPv6:ÂOn Tue, Aug 3, 2021 at 1:56 AM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:_______________________________________________I think the fundamental problem is a combination of your hosts file and the fact that you seem to be forcing HTCondor to use 127.0.0.1 as the preferred IP address.Â
We lookup tuna and get 127.0.0.1 and then we lookup 127.0.0.1 and the first answer in the hosts file is localhost, so that becomes the hostname.
I think you either need to remove tuna from the hosts file, give it a different IP address (like the public IP address), or make it the first entry in the hosts file for 127.0.0.1
But I'm confused how you can have a 3 node pool that is working at all if you are telling HTCondor to use 127.0.0.1 for communication. The nodes should be unable to talk to each other.
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Lyle Pakula <Lyle@xxxxxxxxxxxxxxxx>
Sent: Sunday, August 1, 2021 9:33 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor 9.1ÂHi John,
Thanks for the help.Â
1/ NETWORK_INTERFACE is the same on all machines
lyle@tuna$ condor_config_val -v NETWORK_INTERFACE
NETWORK_INTERFACE = *
Â# at: <Default>
Â# raw: NETWORK_INTERFACE = *
FYI my /etc/hosts on all machines follows a standard layout, ie for @tuna
lyle@tuna$ cat /etc/hosts
127.0.0.1    localhost        tuna
127.0.1.1    tuna.ocwen.com   Âtuna
all machines have a /etc/hostname file containing their "hostname" but domainnameÂis blank.Â
2/ UID_DOMAIN is also similar on all machines, that is default ofÂ
lyle@grenadier:$ condor_config_val -v UID_DOMAIN
UID_DOMAIN = localhost
Â# at: <Default>
Â# raw: UID_DOMAIN = $(FULL_HOSTNAME)
... What I triedIt looked to me that condor is not picking up the actual hostname and perhaps this is because we have no domainname configured.Â
lyle@grenadier:/etc/condor/config.d$ hostname
grenadier
lyle@grenadier:/etc/condor/config.d$ condor_config_val -v HOSTNAME
HOSTNAME = localhost
Â# at: <Detected>
Â# raw: HOSTNAME = localhost
lyle@grenadier:/etc/condor/config.d$ condor_config_val -v FULL_HOSTNAME
FULL_HOSTNAME = localhost
Â# at: <Detected>
Â# raw: FULL_HOSTNAME = localhost
* I tried pointingÂNETWORK_INTERFACE to 127.0.1.1 on all machines and also to the CENTRAL MANAGER ip (something i read) but this did not change what condor picks up as the hostname.Â* I tried setting the UID_DOMAIN=ocwen.comÂon all machinesÂbut this did not work (everything still runs as nobody) and i suspect this is because the hostname is not picked up correctly as well
Thanks, Lyle
On Wed, Jul 28, 2021 at 1:59 AM John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
_______________________________________________I think slots are appearing as localhost because your condor_config is telling condor to use localhost as the primary network interface.ÂÂ
What does the condor_config have set forÂNETWORK_INTERFACE ?
Try running
 Âcondor_config_val -v NETWORK_INTERFACE
By the way, you can see all of your configuration that differs from the default HTCondor configuration by running
  condor_config_val -summary
When a job runs, files will be written as nobody if the job runs as nobody, which happens when HTCondor does not think that the submit node and the execute node have the same set of user ids. It decides this by comparing the value of UID_DOMAIN on both of these machines.Â
Try running
  condor_config_val -v UID_DOMAIN
on both the submit machine and the execute machine, what is the value?
Now having files writting as nobody on the execute node is not a problem when HTCondor is doing file transfer, because it will change ownership of the files as it transfers the results back. but if you are using a shared file systemyou may need to do some additional configuration.Â
Instructions for setting up HTCondor to use shared files system is here
-tj
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Lyle Pakula <Lyle@xxxxxxxxxxxxxxxx>
Sent: Monday, July 26, 2021 7:14 PM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] HTCondor 9.1ÂHi Everyone and thanks for everyone's help in advance!
We have recently upgraded from a very old install of 7.6 to 9.1 on ubuntu 18.04 by basically blowing away everything old (uninstall, remove systemctl, delete "condor user" from all machines) and then followingÂhttps://htcondor.readthedocs.io/en/latest/getting-htcondor/admin-quick-start.html.
* Starting with a basic setup (3 Machines, 3 roles)Â+ NAS mounted on all machines.Â* Vanilla universe Jobs read/write to and from the NASÂ
Question 1 - Why are slots apearing as "localhost" and not the machineÂname they are actually on?lyle@tuna:~$ condor_status
Name      ÂOpSys   ÂArch  State   Activity LoadAv Mem  ActvtyTime
slot1@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:39
slot2@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:36
slot3@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:33
slot4@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:32
slot5@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:31
slot6@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:42
slot7@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:41
slot8@localhost LINUX   ÂX86_64 Unclaimed Idle   Â0.000 1990 Â0+00:30:41
Question 2 - Files are written as nobody:nouser, how can we change this?Â
Problem here is that the written files are unreadable/unwriteable to the submitterÂ
Tried this but did not workÂ
Thanks, Lyle
--
AE CAPITALw http://www.aecapital.com.au
Ground Floor, 555 Bourke Street, Melbourne AustraliaÂ3000
p +61 3 9020 7801
m +61 (0)434 872 054
AE Capital Pty Limited (ACN 153 242 865) is regulated by the Australian Securities & Investments Commission and is a Corporate Authorised Representative of JFM Pty Limited (ACN 125 150 656), holder of an Australian Financial Services Licence (AFSL 314585). AE Capital Pty Limited is a member of the National Futures Association (ID 0498660).
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
AE CAPITALw http://www.aecapital.com.au
Ground Floor, 555 Bourke Street, Melbourne AustraliaÂ3000
p +61 3 9020 7801
m +61 (0)434 872 054
AE Capital Pty Limited (ACN 153 242 865) is regulated by the Australian Securities & Investments Commission and is a Corporate Authorised Representative of JFM Pty Limited (ACN 125 150 656), holder of an Australian Financial Services Licence (AFSL 314585). AE Capital Pty Limited is a member of the National Futures Association (ID 0498660).
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/--AE CAPITALw http://www.aecapital.com.au
Ground Floor, 555 Bourke Street, Melbourne AustraliaÂ3000
p +61 3 9020 7801
m +61 (0)434 872 054
AE Capital Pty Limited (ACN 153 242 865) is regulated by the Australian Securities & Investments Commission and is a Corporate Authorised Representative of JFM Pty Limited (ACN 125 150 656), holder of an Australian Financial Services Licence (AFSL 314585). AE Capital Pty Limited is a member of the National Futures Association (ID 0498660).