[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor-users Digest, Vol 54, Issue 19



Hi,

Thank you for the email regarding the new HTcondor release. Great news!
Is the new release available for the Ubuntu16.04 and Ubuntu18.04 package?

Thank you
_____________________________________________________________________________________________________

Eric F.  Alemany
System Administrator for Research

Division of Radiation & Cancer  Biology
Department of Radiation Oncology

Stanford University School of Medicine
Stanford, California 94305

Tel:1-650-498-7969  No Texting



On May 11, 2018, at 12:05 PM, htcondor-users-request@xxxxxxxxxxx wrote:

Send HTCondor-users mailing list submissions to
htcondor-users@xxxxxxxxxxx

To subscribe or unsubscribe via the World Wide Web, visit
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
or, via email, send a message with subject or body 'help' to
htcondor-users-request@xxxxxxxxxxx

You can reach the person managing the list at
htcondor-users-owner@xxxxxxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of HTCondor-users digest..."


Today's Topics:

  1. Re: (no subject) (Zach Miller)
  2. HTCondor 8.6.11 Released (Tim Theisen)
  3. HTCondor 8.7.8 Released (Tim Theisen)


----------------------------------------------------------------------

Message: 1
Date: Fri, 11 May 2018 19:03:05 +0000
From: Zach Miller <zmiller@xxxxxxxxxxx>
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] (no subject)
Message-ID:
<BN6PR06MB2689135A9E9DDAC16210B8AAF99F0@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>

Content-Type: text/plain; charset="utf-8"

Are you saying that condor_submit is failing when you run it?  Or what are the symptoms you are seeing as a result of the FS failure?


It appears from the included SchedLog that the submit process is unable to create the file required in /tmp.

You can get a detailed log from the client side by setting the environment variable _condor_TOOL_DEBUG to D_ALL:2
Then as the user having trouble submitting, run:
 condor_ping -debug WRITE

This will essentially simulate a job submission and you can capture the stderr and look at where it is doing FS authentication.  Perhaps there is a clue there.  Otherwise please forward me the captured stderr (off-list) and I will see if I can diagnose the problem.  Thanks!

Cheers,
-zach


-----Original Message-----
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of
Weiming Shi
Sent: Friday, May 11, 2018 11:06 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] (no subject)


Hi HTCondor Community,

We use sssd for authentication. Previously nscd service will also be run.
Recently we disabled the nscd service and found that FS Authentication
fails frequently for some users on some of our submit machines. We have to
frequently remove any running job on the affected submit machines and
restart the condor service on those machines to make the job submission
work again.

Any advice on how to troubleshoot and debug this kind of issue is
appreciated.

Thanks

Here are the related condor settings that we set:
# Parameters with names that match sec:
DCSTATISTICS_WINDOW_SECONDS =
ENCRYPT_SECRETS = true
IGNORE_ATTEMPTS_TO_SET_SECURE_JOB_ATTRS = true
SEC_CLAIMTOBE_INCLUDE_DOMAIN = false
SEC_CLAIMTOBE_USER =
SEC_DEBUG_PRINT_KEYS = false
SEC_DEFAULT_AUTHENTICATION_METHODS = FS
SEC_DEFAULT_AUTHENTICATION_TIMEOUT = 10
SEC_ENABLE_MATCH_PASSWORD_AUTHENTICATION = true
SEC_INVALIDATE_SESSIONS_VIA_TCP = true
SEC_PASSWORD_DOMAIN =
SEC_PASSWORD_FILE =
SEC_SESSION_DURATION_SLOP = 20
SEC_TCP_SESSION_TIMEOUT = 20
SECURE_JOB_ATTRS =
STATISTICS_WINDOW_SECONDS = 1200
SYSTEM_SECURE_JOB_ATTRS = x509userProxySubject x509UserProxyEmail
x509UserProxyVOName x509UserProxyFirstFQAN x509UserProxyFQAN
SCHEDD_DEBUG = D_PID D_FULLDEBUG D_SECURITY


Here are the corresponding error messages that we saw in SchedLog:

05/11/18 11:35:51 (pid:1512632) ============ Begin clean_shadow_recs
=============
05/11/18 11:35:51 (pid:1512632) ============ End clean_shadow_recs
=============
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: received DC_AUTHENTICATE
from <10.40.243.245:49415 <http://10.40.243.245:49415> >
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: received following
ClassAd:
NewSession = "YES"
Subsystem = "TOOL"
AuthMethods = "FS"
CryptoMethods = "3DES,BLOWFISH"
Authentication = "OPTIONAL"
Integrity = "OPTIONAL"
Command = 519
Encryption = "OPTIONAL"
ServerPid = 1586331
SessionDuration = "60"
OutgoingNegotiation = "PREFERRED"
Enact = "NO"
SessionLease = 3600
RemoteVersion = "$CondorVersion: 8.5.8 Dec 13 2016 BuildID: 390781 $"
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: our_policy:
SessionDuration = "86400"
AuthMethods = "FS"
Authentication = "REQUIRED"
Subsystem = "SCHEDD"
Enact = "NO"
ParentUniqueID = "htdsubmit1:1512588:1525992838"
Integrity = "OPTIONAL"
CryptoMethods = "3DES,BLOWFISH"
OutgoingNegotiation = "REQUIRED"
Encryption = "OPTIONAL"
SessionLease = 3600
ServerPid = 1512632
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: the_policy:
Authentication = "YES"
Integrity = "NO"
SessionDuration = "60"
AuthMethodsList = "FS"
Encryption = "NO"
SessionLease = 3600
CryptoMethods = "3DES,BLOWFISH"
Enact = "YES"
AuthMethods = "FS"
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: generating 3DES key for
session htdsubmit1:1512632:1526052955:1047...
05/11/18 11:35:55 (pid:1512632) SECMAN: Sending following response ClassAd:
Authentication = "YES"
Integrity = "NO"
SessionDuration = "60"
AuthMethodsList = "FS"
Encryption = "NO"
RemoteVersion = "$CondorVersion: 8.5.8 Dec 13 2016 BuildID: 390781 $"
SessionLease = 3600
CryptoMethods = "3DES,BLOWFISH"
Enact = "YES"
AuthMethods = "FS"
05/11/18 11:35:55 (pid:1512632) SECMAN: new session, doing initial
authentication.
05/11/18 11:35:55 (pid:1512632) Returning to DC while we wait for socket to
authenticate.
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: authenticating RIGHT NOW.
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: setting timeout for (unknown)
to 10.
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: in authenticate( addr ==
'(unknown)', methods == 'FS')
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: can still try these methods:
FS
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: in handshake(my_methods = 'FS')
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: handshake() - i am the server
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client sent (methods == 4)
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: i picked (method == 4)
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client received (method == 4)
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: will try to use 4 (FS)
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: do_authenticate is 1.
05/11/18 11:35:55 (pid:1512632) FS: client template is /tmp/FS_XXXXXXXXX
05/11/18 11:35:55 (pid:1512632) FS: client filename is /tmp/FS_XXXZFbeht
05/11/18 11:35:55 (pid:1512632) Will return to DC because authentication is
incomplete.
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE_FS: used dir
/tmp/FS_XXXZFbeht, status: 0
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: do_authenticate is 0.
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: method -1 (FS) failed.
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: can still try these methods:
FS
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: in handshake(my_methods = 'FS')
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: handshake() - i am the server
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client sent (methods == 0)
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: i picked (method == 0)
05/11/18 11:35:55 (pid:1512632) HANDSHAKE: client received (method == 0)
05/11/18 11:35:55 (pid:1512632) AUTHENTICATE: no available authentication
methods succeeded!
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: authentication of
<10.40.243.245:49415 <http://10.40.243.245:49415> > did not result in a
valid mapped user name, which is required for this command (519
QUERY_JOB_ADS_WITH_AUTH), so aborting.
05/11/18 11:35:55 (pid:1512632) DC_AUTHENTICATE: reason for authentication
failure: AUTHENTICATE:1003:Failed to authenticate with any
method|AUTHENTICATE:1004:Failed to authenticate using FS|FS:1006:Unable to
lookup uid 1262







------------------------------

Message: 2
Date: Fri, 11 May 2018 14:04:17 -0500
From: Tim Theisen <tim@xxxxxxxxxxx>
To: htcondor-world@xxxxxxxxxxx, HTCondor-Users Mail List
<htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] HTCondor 8.6.11 Released
Message-ID: <5ef20c75-1e64-a893-79f4-784eb3449726@xxxxxxxxxxx>
Content-Type: text/plain; charset=utf-8

The HTCondor team is pleased to announce the release of HTCondor 8.6.11.
A stable series release contains significant bug fixes.

Highlights of this release are:
- Can now do an interactive submit of a Singularity job
- Shared port daemon is more resilient when starved for TCP ports
- The Windows installer configures the environment for the Python bindings
- Fixed several other minor problems

More details about the fixes can be found in the Version History:
http://htcondor.org/manual/v8.6.11/10_3Stable_Release.html

Downloads Page:
http://htcondor.org/downloads/

Thank you for your interest in HTCondor!

- The HTCondor Team

--
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736



------------------------------

Message: 3
Date: Fri, 11 May 2018 14:05:22 -0500
From: Tim Theisen <tim@xxxxxxxxxxx>
To: htcondor-world@xxxxxxxxxxx, HTCondor-Users Mail List
<htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] HTCondor 8.7.8 Released
Message-ID: <7a750944-e683-a4ca-76bf-36935f0a3249@xxxxxxxxxxx>
Content-Type: text/plain; charset=utf-8

The HTCondor team is pleased to announce the release of HTCondor 8.7.8.
This development series release contains new features that are under
development. This release contains all of the bug fixes from the 8.6.11
stable release.

Enhancements in the release include:
- The condor annex can easily use multiple regions simultaneously
- HTCondor now uses CUDA_VISIBLE_DEVICES to tell which GPU devices to manage
- HTCondor now reports GPU memory utilization

A complete list of new features and fixed bugs can be found in the version
history.

Version Histories:
http://htcondor.org/manual/v8.7.8/DevelopmentReleaseSeries87.html
http://htcondor.org/manual/v8.7.8/StableReleaseSeries86.html

Downloads Page:
http://www.cs.wisc.edu/htcondor/downloads/

Thank you for your interest in HTCondor!

- The HTCondor Team

--
Tim Theisen
Release Manager
HTCondor & Open Science Grid
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin - Madison
4261 Computer Sciences and Statistics
1210 W Dayton St
Madison, WI 53706-1685
+1 608 265 5736



------------------------------

Subject: Digest Footer

_______________________________________________
HTCondor-users mailing list
HTCondor-users@xxxxxxxxxxx
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

------------------------------

End of HTCondor-users Digest, Vol 54, Issue 19
**********************************************