[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] condor_rm failing for one user because of credential problem



I have one user who can submit and run jobs without any trouble, but who cannot remove his jobs from the queue. I can remove them with my queue superuser account. My first thought was that the filesystem where it was attempting to write (for filesystem authentication) was full, but that was not the case. Other users can submit and remove jobs. Any ideas what might be causing this problem?

________________________________

User's error message:

AUTHENTICATE:1003:Failed to authenticate with any method
AUTHENTICATE:1004:Failed to authenticate using GSI
GSI:5003:Failed to authenticate. Globus is reporting error (851968:45). There is probably a problem with your credentials. (Did you run grid-proxy-init?)
AUTHENTICATE:1004:Failed to authenticate using KERBEROS
AUTHENTICATE:1004:Failed to authenticate using FS
Couldn't find/remove all of user all's job(s).


_________________________________

$CondorVersion: 6.8.2 Oct 12 2006 $
$CondorPlatform: I386-LINUX_RHEL3 $
Linux [hostname] 2.6.17.4 #1 SMP Wed Jul 12 14:41:00 CDT 2006 i686 GNU/Linux
running on Ubuntu Dapper

_____________________________

related SchedLog entry:

10/30 10:18:58 (pid:4817) DaemonCore: Command received via TCP from host <128.83.120.62:35604> 10/30 10:18:58 (pid:4817) DaemonCore: received command 478 (ACT_ON_JOBS), calling handler (actOnJobs) 10/30 10:18:58 (pid:4817) authenticate_self_gss: acquiring self credentials failed. Please check your Condor configuration file if this is a server process. Or the user environment variable if this is a user process.

GSS Major Status: General failure
GSS Minor Status Error Chain:
globus_gsi_gssapi: Error with GSI credential
globus_gsi_gssapi: Error with gss credential handle
globus_credential: Valid credentials could not be found in any of the possible locations specified by the credential search order. Valid credentials could not be found in any of the possible locations specified by the credential search order.

Attempt 1

globus_credential: Error reading host credential
globus_sysconfig: Could not find a valid certificate file: The host cert could not be found in:
1) env. var. X509_USER_CERT
2) /etc/grid-security/hostcert.pem
3) $GLOBUS_LOCATION/etc/hostcert.pem
4) $HOME/.globus/hostcert.pem

The host key could not be found in:
1) env. var. X509_USER_KEY
2) /etc/grid-security/hostkey.pem
3) $GLOBUS_LOCATION/etc/hostkey.pem
4) $HOME/.globus/hostkey.pem



Attempt 2

globus_credential: Error reading proxy credential
globus_sysconfig: Could not find a valid proxy certificate file location
globus_sysconfig: Error with key filename
globus_sysconfig: File does not exist: /tmp/x509up_u0 is not a valid file

Attempt 3

globus_credential: Error reading user credential
globus_sysconfig: Error with certificate filename: The user cert could not be found in:
1) env. var. X509_USER_CERT
2) $HOME/.globus/usercert.pem
3) $HOME/.globus/usercred.p12




10/30 10:18:58 (pid:4817) AUTHENTICATE: no available authentication methods succeeded, failing! 10/30 10:18:58 (pid:4817) actOnJobs() aborting: SCHEDD:4001:Failed to act on jobs - Authentication failed|AUTHENTICATE:1003:Failed to authenticate with any method|AUTHENTICATE:1004:Failed to authenticate using GSI|GSI:5003:Failed to authenticate. Globus is reporting error (851968:45). There is probably a problem with your credentials. (Did you run grid-proxy-init?)|AUTHENTICATE:1004:Failed to authenticate using KERBEROS|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1005:Bad attributes on (/tmp/FS_XXXSAH3Uf) 10/30 10:18:58 (pid:4817) condor_write(): Socket closed when trying to write 13 bytes to <[IP]:35604>, fd is 11
10/30 10:18:58 (pid:4817) Buf::write(): condor_write() failed