Thanks Benedikt.
I added/changed the configs on each machine so that condor_config_val UID_DOMAIN and FILESYSTEM_DOMAIN return the same thing:
timehole.org. Now the jobs canât even write the output and error files and are held immediately with HOLD_REASON Failed to open [output fle] as standard output: Permission denied (errno 13).
In the central manager (CM) and execution point (EP) /etc/condor/condor_config file I added:
ALLOW_WRITE = *.timehole.org
UID_DOMAIN = timehole.og
FILESYSTEM_DOMAIN = timehole.org
The CM is also the access point/submit machine.
Both EPs have CONDOR_HOST = bench9.timehole.org in the /etc/condor/config.d/01-execute.config file. Both can resolve
bench9.timehole.org via the /etc/host file. The CM also has CONDOR_HOST = bench9.timehole.org in the /etc/condor/config.d/01-central-manager.config file.
The nfs/autofs configuration seems correct to me - I can login to each machine and read/write the shared home directory.
Does it matter that the domain name, timehole.org, isnât real? I manually edited the /etc/hosts file so the CM can resolve the fake FQDN of each EP.
JK
|