[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] Troubleshooting slots configuration



Hi All,

 

I have installed condor 9.12 on a Ubuntu 20.04 server using the rpm packages.

 

The directory /etc/condor/config.d contains:

 

/etc/condor/config.d/etc/condor/config.d/10-nes-cm-submit-execute-node.config (default file)

use security:recommended_v9_0

 

/etc/condor/config.d/10-nes-cm-submit-execute-node.config (created by me)

use ROLE : centralmanager

use ROLE : submit

use ROLE : execute

CONDOR_HOST = 192.168.10.160

CONDOR_COLLECTOR = $(CONDOR_HOST)

 

/etc/condor/config.d/20-local-hardware.config (created by me)

use feature : GPUs

GPU_DISCOVERY_EXTRA = -extra

NUM_CPUS = 20

MACHINE_RESOURCE_GPUs = GPU_0, GPU_1, GPU_2, GPU_3

ENVIRONMENT_FOR_AssignedGPUs = GPU_NAME GPU_ID=/CUDA//

NUM_SLOTS = 1

NUM_SLOTS_TYPE_1 = 1

SLOT_TYPE_1 = cpus=100%

SLOT_TYPE_1_PARTITIONABLE = true

 

/var/log/condor/MasterLog indicates that the three files above are considered to determine the overall configuration.

 

The file 20-local-hardware.config was used on a previous condor 8.8 configuration.

 

At this time, if I type “condor_config” I get no output on screen. All the expected processes are running.

 

Does anyone have any tip regarding the why no slot / node information is appearing with condor_status? Is there any particular log file that might indicate problems with the slot and GPU resources definitions? I have looked at the files under /var/log/condor but I wasn’t able to find any clue regarding why the system seems not configured properly.

 

Thanks for any advice,

 

Best Regards,

 

Andrea