Hi Ben, have you tried, if it works when you explicitly request a SL6 node? ... requirements = OpSysAndVer == "SL6" ... Cheers, Thomas On 2018-08-08 10:27, Ben Pietras wrote: > Hi, > > Apologies if this isn't the appropriate channel; my first post. > > I have 1 master and 10 nodes all on CentOS7, HTCondor 8.6.10 > > I have to keep SL6.9 on this particular machine and want to include it in the cluster > condor_status shows the SL6.9 machine threads as available, but never actually claims them (the job does run outside of condor on the SL6.9). > > slot8@fastpc30 LINUX X86_64 Claimed Busy 0.730 1970 0+00:00:03 > slot1@fastpc31 LINUX X86_64 Unclaimed Idle 0.610 1994 0+00:44:37 > [...] > > condor_q -better-analyze 6750 > > 6750.1069: Run analysis summary ignoring user priority. Of 252 machines, > 0 are rejected by your job's requirements > 0 reject your job because of their own requirements > 244 match and are already running your jobs > 0 match but are serving other users > 0 are available to run your job > > ---------------------- > On the SL6.9 machine > ---------------------- > > cat /var/log/messages | grep condor > > Aug 7 14:34:34 fastpc31 yum[1962]: Installed: condor-8.6.10-1.el6.x86_64 > Aug 7 14:35:21 fastpc31 htcondor: Not changing GLOBAL_MAX_FDS (/proc/sys/fs/file-max): new value (32768) <= old value (1606869). > Aug 7 14:35:21 fastpc31 htcondor: Not changing TCP_LISTEN_QUEUE (/proc/sys/net/core/somaxconn): new value (1024) <= old value (1024). > Aug 7 14:35:21 fastpc31 htcondor: Not changing ROOT_MAXKEYS (/proc/sys/kernel/keys/root_maxkeys): new value (1000000) <= old value (1000000). > Aug 7 14:35:21 fastpc31 htcondor: Not changing ROOT_MAXKEYS_BYTES (/proc/sys/kernel/keys/root_maxbytes): new value (25000000) <= old value (25000000). > Aug 7 14:35:21 fastpc31 htcondor: Changing FS_CACHE_DIRTY_BYTES (/proc/sys/vm/dirty_bytes) from 100000000 to 100000000 > Aug 7 14:35:21 fastpc31 htcondor: Not changing MAX_RECEIVE_BUFFER (/proc/sys/net/core/rmem_max): new value (10485760) <= old value (10485760). > > Version : 8.6.10 (Installed same version as host, as had this issue with 8.7.9) > id condor > uid=990(condor) gid=985(condor) groups=985(condor) > (same for all in cluster) > > On SL6.9 (kernel 2.6.32-754.2.1.el6.x86_64) node: > > service condor status > condor_master (pid 2004) is running.... > > cat /var/log/condor/MasterLog > > 08/07/18 14:35:21 ****************************************************** > 08/07/18 14:35:21 ** condor_master (CONDOR_MASTER) STARTING UP > 08/07/18 14:35:21 ** /usr/sbin/condor_master > 08/07/18 14:35:21 ** SubsystemInfo: name=MASTER type=MASTER(2) class=DAEMON(1) > 08/07/18 14:35:21 ** Configuration: subsystem:MASTER local:<NONE> class:DAEMON > 08/07/18 14:35:21 ** $CondorVersion: 8.6.10 Mar 12 2018 BuildID: 435200 $ > 08/07/18 14:35:21 ** $CondorPlatform: x86_64_RedHat6 $ > 08/07/18 14:35:21 ** PID = 2004 > 08/07/18 14:35:21 ** Log last touched 8/7 14:22:46 > 08/07/18 14:35:21 ****************************************************** > 08/07/18 14:35:21 Using config source: /etc/condor/condor_config > 08/07/18 14:35:21 Using local config sources: > 08/07/18 14:35:21 /etc/condor/config.d/condor_execute_fastpc31.config > 08/07/18 14:35:21 /etc/condor/condor_config.local > 08/07/18 14:35:21 config Macros = 74, Sorted = 74, StringBytes = 1852, TablesBytes = 2712 > 08/07/18 14:35:21 CLASSAD_CACHING is OFF > 08/07/18 14:35:21 Daemon Log is logging: D_ALWAYS D_ERROR > 08/07/18 14:35:22 SharedPortEndpoint: waiting for connections to named socket 2004_7849 > 08/07/18 14:35:22 SharedPortEndpoint: failed to open /var/lock/condor/shared_port_ad: No such file or directory > 08/07/18 14:35:22 SharedPortEndpoint: did not successfully find SharedPortServer address. Will retry in 60s. > 08/07/18 14:35:22 DaemonCore: private command socket at <10.0.0.31:0?sock=2004_7849> > 08/07/18 14:35:22 Adding SHARED_PORT to DAEMON_LIST, because USE_SHARED_PORT=true (to disable this, set AUTO_INCLUDE_SHARED_PORT_IN_DAEMON_LIST=False) > 08/07/18 14:35:22 Master restart (GRACEFUL) is watching /usr/sbin/condor_master (mtime:1520893905) > 08/07/18 14:35:22 Collector port not defined, will use default: 9618 > 08/07/18 14:35:22 Started DaemonCore process "/usr/libexec/condor/condor_shared_port", pid and pgroup = 2037 > 08/07/18 14:35:22 Waiting for /var/lock/condor/shared_port_ad to appear. > 08/07/18 14:35:23 Found /var/lock/condor/shared_port_ad. > 08/07/18 14:35:23 Started DaemonCore process "/usr/sbin/condor_startd", pid and pgroup = 2038 > 08/07/18 14:35:33 Setting ready state 'Ready' for STARTD > > Which looks OK to me. Does anyone have suggestions? > > Thanks, > Ben > ---------------------------------------------------------------------------- > Ben Pietras <ben.pietras@xxxxxxxxxxxxxxxx> > School of Physics and Astronomy, Tel. 0161-275-4231 > The University of Manchester, Fax. 0161-275-5509 > Manchester, M13 9PL. > ---------------------------------------------------------------------------- > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/ >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature