Hello again,
So I've made a little bit of progress, but am still having issues. I believe I resolved some authentication issues by adding an ALLOW_WRITE to the common condor_config file.ÂÂ
Now, when I submit jobs, they are assigned to the node and begin executing. However, they crash immediately when they attempt to read a file from the submit machine's file directory. I was under the impression that then when a condor job is submitted and executed on another node, condor will spoof things such that the job still sees the submit machine's file directory. Do I need to configure something else to make this work?
The other issue I'm seeing is that when I run condor_reconfig -al or condor_restart -all, I get this:
ERROR
AUTHENTICATE:1003:Failed to authenticate with any method
AUTHENTICATE:1004:Failed to authenticate using GSI
GSI:5003:Failed to authenticate. Globus is reporting error (851968:50). There is probably a problem with your credentials. Â(Did you run grid-proxy-init?)
AUTHENTICATE:1004:Failed to authenticate using KERBEROS
AUTHENTICATE:1004:Failed to authenticate using FS
Can't send Reconfig command to master s01-012
********************************
Logs
********************************
On the central node:
SchedLog - Previous problems are resolved, but I still see this error which I forgot to include before:
my_popenv: Failed to exec in child, errno=2 (No such file or directory)
Failed to execute /usr/sbin/condor_shadow.std, ignoring
CollectorLog - No more errors
NegotiatorLog - No more errors
On execute node:
MasterLog - Says authentication is failing via GSI, KERBEROS, FS. For some reason, nothing is reported for password, the authentication method I have setup:
02/04/20 20:08:39 authenticate_self_gss: acquiring self credentials failed. Please check your Condor configuration file if this is a server process. Or the user environment variable if this is a user process.
...
02/04/20 20:08:39 DC_AUTHENTICATE: required authentication of 141.212.115.83 failed: AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5003:Failed to authenticate. Globus is reporting error (851968:662). There is probably a problem with your credentials. Â(Did you run grid-proxy-init?)|AUTHENTICATE:1004:Failed to authenticate using KERBEROS|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1004:Unable to lstat(/tmp/FS_XXXNxznHI)
********************************
Configuration
********************************
The common condor_config file contains now:
## ÂWhere have you installed the bin, sbin and lib condor directories? Â
RELEASE_DIR = /usr
## ÂWhere is the local condor directory for each host? This is where the local config file(s), logs and
## Âspool/execute directories are located. this is the default for Linux and Unix systems.
LOCAL_DIR = /var
## ÂWhere is the machine-specific local config file for each host?
LOCAL_CONFIG_FILE = /etc/condor/condor_config.local
## ÂIf your configuration is on a shared file system, then this might be a better default
#LOCAL_CONFIG_FILE = $(RELEASE_DIR)/etc/$(HOSTNAME).local
## ÂIf the local config file is not present, is it an error? (WARNING: This is a potential security issue.)
REQUIRE_LOCAL_CONFIG_FILE = false
## ÂThe normal way to do configuration with RPMs is to read all of the
## Âfiles in a given directory that don't match a regex as configuration files.
## ÂConfig files are read in lexicographic order.
LOCAL_CONFIG_DIR = /etc/condor/config.d
#LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$
## ÂUse a host-based security policy. By default CONDOR_HOST and the local machine will be allowed
use SECURITY : HOST_BASED
## ÂTo expand your condor pool beyond a single host, set ALLOW_WRITE to match all of the hosts
ALLOW_WRITE = */*.
eecs.umich.edu## ÂFLOCK_FROM defines the machines that grant access to your pool via flocking. (i.e. these machines can join your pool).
#FLOCK_FROM =
## ÂFLOCK_TO defines the central managers that your schedd will advertise itself to (i.e. these pools will give matches to your schedd).
#FLOCK_TO =
condor.cs.wisc.edu,
cm.example.edu##--------------------------------------------------------------------
## Values set by the debian patch script:
##--------------------------------------------------------------------
## For Unix machines, the path and file name of the file containing
## the pool password for password authentication.
#SEC_PASSWORD_FILE = $(LOCAL_DIR)/lib/condor/pool_password
## ÂPathnames
RUN Â Â = $(LOCAL_DIR)/run/condor
LOG Â Â = $(LOCAL_DIR)/log/condor
LOCK Â Â= $(LOCAL_DIR)/lock/condor
SPOOL Â = $(LOCAL_DIR)/spool/condor
EXECUTE = $(LOCAL_DIR)/lib/condor/execute
CRED_STORE_DIR = $(LOCAL_DIR)/lib/condor/cred_dir
ETC Â Â = /etc/condor
BIN Â Â = $(RELEASE_DIR)/bin
LIB Â Â = $(RELEASE_DIR)/lib/condor
INCLUDE = $(RELEASE_DIR)/include/condor
SBIN Â Â= $(RELEASE_DIR)/sbin
LIBEXEC = $(RELEASE_DIR)/lib/condor/libexec
SHARE Â = $(RELEASE_DIR)/share/condor
MAIL Â Â= /usr/bin/mail
GANGLIA_LIB64_PATH = /lib,/usr/lib,/usr/local/lib
PROCD_ADDRESS = $(RUN)/procd_pipe
## ÂInstall the minihtcondor package to run HTCondor on a single node
The security config file contains:
SEC_PASSWORD_FILE = /etc/condor/password.d/POOL
SEC_DAEMON_AUTHENTICATION = REQUIRED
SEC_DAEMON_INTEGRITY = REQUIRED
SEC_DAEMON_AUTHENTICATION_METHODS = PASSWORD
SEC_NEGOTIATOR_AUTHENTICATION = REQUIRED
SEC_NEGOTIATOR_INTEGRITY = REQUIRED
SEC_NEGOTIATOR_AUTHENTICATION_METHODS = PASSWORD
SEC_CLIENT_AUTHENTICATION_METHODS = FS, PASSWORD, KERBEROS, GSI
ALLOW_DAEMON = */*.<<< rest of hostname >>>, \
Â*/$(IP_ADDRESS)
ALLOW_NEGOTIATOR = */<<< hostname >>>
Thank you, any help would be appreciated
Jonathan Bailey