Duncan,
Are the DAGs completing correctly in spite of the error messages?
DAGMan is daemoncore, so it could be that it's trying to find the shared port daemon even though it doesn't actually need to (DAGMan doesn't open a command port).
I'll see if I can reproduce this...
Kent -- R. Kent Wenger (wenger@xxxxxxxxxxx, 608-262-6627, http://www.cs.wisc.edu/~wenger/) Computer Sciences Department University of Wisconsin-Madison From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Duncan Brown <dabrown@xxxxxxx>
Sent: Friday, February 10, 2017 12:39 PM To: htcondor-users@xxxxxxxxxxx Subject: [HTCondor-users] SharedPortEndpoint error in dag.dagman.out Hi all,
Since upgrading to 8.6, users are reporting the following error in their dag.dagman.out files: 02/10/17 13:20:41 SharedPortEndpoint: failed to open ./shared_port_ad: No such file or directory 02/10/17 13:20:41 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s. 02/10/17 13:21:41 SharedPortEndpoint: failed to open ./shared_port_ad: No such file or directory 02/10/17 13:21:41 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s. I see some discussion about this in the archives for the regular daemons, but not for dagman. The first occurrence is at the top of the log, then it repeats: 02/08/17 11:34:32 ****************************************************** 02/08/17 11:34:32 ** condor_scheduniv_exec.5452054.0 (CONDOR_DAGMAN) STARTING UP 02/08/17 11:34:32 ** /usr/bin/condor_dagman 02/08/17 11:34:32 ** SubsystemInfo: name=DAGMAN type=DAGMAN(10) class=DAEMON(1) 02/08/17 11:34:32 ** Configuration: subsystem:DAGMAN local:<NONE> class:DAEMON 02/08/17 11:34:32 ** $CondorVersion: 8.6.0 Jan 26 2017 BuildID: 395190 $ 02/08/17 11:34:32 ** $CondorPlatform: x86_64_RedHat7 $ 02/08/17 11:34:32 ** PID = 1344275 02/08/17 11:34:32 ** Log last touched 2/8 11:03:46 02/08/17 11:34:32 ****************************************************** 02/08/17 11:34:32 Using config source: /etc/condor/condor_config 02/08/17 11:34:32 Using local config sources: 02/08/17 11:34:32 /etc/condor/config.d/00_gwms_general.config 02/08/17 11:34:32 /etc/condor/config.d/02_gwms_schedds.config 02/08/17 11:34:32 /etc/condor/config.d/03_gwms_local.config 02/08/17 11:34:32 /etc/condor/config.d/90_gwms_dns.config 02/08/17 11:34:32 /etc/condor/config.d/92_flocking_osg_ligo.config 02/08/17 11:34:32 /etc/condor/config.d/99_gratia-gwms.conf 02/08/17 11:34:32 /etc/condor/config.d/99_gratia.conf 02/08/17 11:34:32 /etc/condor/condor_config.local 02/08/17 11:34:32 config Macros = 170, Sorted = 170, StringBytes = 8181, TablesBytes = 6224 02/08/17 11:34:32 CLASSAD_CACHING is ENABLED 02/08/17 11:34:32 Daemon Log is logging: D_ALWAYS D_ERROR 02/08/17 11:34:32 DaemonCore: No command port requested. 02/08/17 11:34:32 SharedPortEndpoint: waiting for connections to named socket 1344275_18a2 02/08/17 11:34:32 SharedPortEndpoint: failed to open ./shared_port_ad: No such file or directory 02/08/17 11:34:32 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s. Any ideas? Cheers, Duncan. -- Duncan Brown http://dbrown10.expressions.syr.edu Charles Brightman Professor of Physics Room 263-1 Physics Department Director of the Graduate Program Syracuse University, NY 13244, USA Phone: 315 443 5993 Fax: 315 443 9103 _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ |