[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Ubuntu 14.04 GPU functionality



Dear all,

 

I recently installed condor 8.3.2 via dpkg in a clean Ubuntu 14.04 OS. I added the lines:

 

use feature : GPUs

GPU_DISCOVERY_EXTRA = -extra

 

into the condor_config.local file located in /etc/condor.

 

Problems arise when I start condor via sudo condor service start. All of the daemons on the DAEMONS_LIST in the local file start except for STARTD.

 

Here is the StarterLog. I'm not sure how to fix this.

 

01/06/15 19:04:02 ******************************************************

01/06/15 19:04:02 ** condor_startd (CONDOR_STARTD) STARTING UP

01/06/15 19:04:02 ** /usr/sbin/condor_startd

01/06/15 19:04:02 ** SubsystemInfo: name=STARTD type=STARTD(7) class=DAEMON(1)

01/06/15 19:04:02 ** Configuration: subsystem:STARTD local:<NONE> class:DAEMON

01/06/15 19:04:02 ** $CondorVersion: 8.3.2 Dec 16 2014 BuildID: 288596 $

01/06/15 19:04:02 ** $CondorPlatform: x86_64_Ubuntu14 $

01/06/15 19:04:02 ** PID = 16066

01/06/15 19:04:02 ** Log last touched 1/6 18:59:37

01/06/15 19:04:02 ******************************************************

01/06/15 19:04:02 Using config source: /etc/condor/condor_config

01/06/15 19:04:02 Using local config sources:

01/06/15 19:04:02 /etc/condor/condor_config.local

01/06/15 19:04:02 config Macros = 88, Sorted = 88, StringBytes = 2901, TablesBytes = 3216

01/06/15 19:04:02 CLASSAD_CACHING is ENABLED

01/06/15 19:04:02 Daemon Log is logging: D_ALWAYS D_ERROR

01/06/15 19:04:02 Daemoncore: Listening at <0.0.0.0:43738> on TCP (ReliSock) and UDP (SafeSock).

01/06/15 19:04:02 DaemonCore: command socket at <192.168.6.108:43738>

01/06/15 19:04:02 DaemonCore: private command socket at <192.168.6.108:43738>

01/06/15 19:04:02 my_popenv failed

01/06/15 19:04:02 Failed to run hibernation plugin '/usr/libexec/condor_power_state ad'

01/06/15 19:04:02 VM-gahp server reported an internal error

01/06/15 19:04:02 VM universe will be tested to check if it is available

01/06/15 19:04:02 History file rotation is enabled.

01/06/15 19:04:02 Maximum history file size is: 20971520 bytes

01/06/15 19:04:02 Number of rotated history files is: 2

01/06/15 19:04:02 ERROR "Failed to execute local resource 'GPUs' inventory script "/usr/libexec/condor_gpu_discovery -properties -extra"" at line 625 in file /slots/01/dir_53959/userdir/src/condor_startd.V6/ResAttributes.cpp

 

 

the condor_gpu_discovery script is located in /usr/lib/condor/libexec/ not in /usr/libexec. What variable do I need to set for condor to find this file in it's correct location? The relevant variables from the global config file are as follows:

 

##--------------------------------------------------------------------

## Pathnames:

##--------------------------------------------------------------------

## Where have you installed the bin, sbin and lib condor directories?

RELEASE_DIR = /usr

 

## Where is the local condor directory for each host?

## This is where the local config file(s), logs and

## spool/execute directories are located

LOCAL_DIR = /var/condor

#LOCAL_DIR = $(RELEASE_DIR)/hosts/$(HOSTNAME)

 

## Where is the machine-specific local config file for each host?

CONFIG_DIR = /etc/condor

LOCAL_CONFIG_FILE = $(CONFIG_DIR)/condor_config.local

 

## Where are optional machine-specific local config files located?

## Config files are included in lexicographic order.

LOCAL_CONFIG_DIR = $(LOCAL_DIR)/config

 

## Blacklist for file processing in the LOCAL_CONFIG_DIR

## LOCAL_CONFIG_DIR_EXCLUDE_REGEXP = ^((\..*)|(.*~)|(#.*)|(.*\.rpmsave)|(.*\.rpmnew))$

 

## If the local config file is not present, is it an error?

## WARNING: This is a potential security issue.

## If not specified, the default is True

#REQUIRE_LOCAL_CONFIG_FILE = TRUE

 

Any help would be greatly appreciated

 

 

 

Michael McInerny Murphy

Engineer

IERUS Technologies, Inc.

2904 Westcorp Blvd., Suite 210

(256) 319-2026 x 107