I have some Vanilla universe jobs that are having problems when flocking to the Condor cluster at HEP. Following is an example from the ShadowLog:
2/3 10:57:59 ******************************************************
2/3 10:57:59 ** condor_shadow (CONDOR_SHADOW) STARTING UP
2/3 10:57:59 ** $CondorVersion: 6.4.7 Jan 26 2003 $
2/3 10:57:59 ** $CondorPlatform: INTEL-LINUX-GLIBC22 $
2/3 10:57:59 ** PID = 27608
2/3 10:57:59 ******************************************************
2/3 10:57:59 DaemonCore: Command Socket at <144.92.101.149:57227>
2/3 10:58:00 Initializing a VANILLA shadow
2/3 10:58:00 (1506.440) (27608): Request to run on <128.104.28.10:32769> was ACCEPTED
2/3 12:18:10 (1506.440) (27608): DC_AUTHENTICATE: attempt to open invalid session condor:27608:1075827490:1, failing.
2/3 12:37:05 (1506.440) (27608): DC_AUTHENTICATE: attempt to open invalid session condor:27608:1075827480:0, failing.
2/3 12:37:06 (1506.440) (27608): ERROR "Can no longer communicate with condor_starter on execute machine" at line 138 in file NTreceivers.C
Apparently, HEP has some kind of timeout after one hour and twenty minutes of activity. What parameter controls this?
Is there some some parameter I can set using condor_qedit so that these jobs don't try flocking to HEP, or do I need to disable flocking to HEP globally in the config file and do a condor_reconfig?
In general, the Condor project works to make certain all versions of Condor within the same stable series will always be compatible with each other over the wire. Thus you can have a mixed pool of v6.6.0, v6.6.1, ... v6.6.x without any trouble. We do not make guarantees across *series* releases.... so mixing v6.4.x and v6.6.x is not promised to work.
However, in the specific case of v6.4.7 and v6.6.0, you can likely make the two of them happy together by making a small addition into the config file. The issue is version 6.4.7 will disable security session negotation by
default, and 6.6 will enable them by default, so mixing the two is a bad idea unless you define disable sessions for both via:
SEC_DEFAULT_NEGOTIATION = NEVER
So just place the above line into both your v6.4.7 and v6.6.0 condor_config file(s) and do condor_restart for all machines (maybe condor_reconfig would do the trick, but I am not certain).
Note that if you are using strong secure channels, the above workaround will not help you.
hope this helps, Todd
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Todd Tannenbaum University of Wisconsin-Madison Condor Project Research Department of Computer Sciences
Condor Support Information: http://www.cs.wisc.edu/condor/condor-support/ To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with unsubscribe condor-users <your_email_address>