The StartLog will probably tell you what it is unhappy about. I would guess that it is failing to start up because it cannot provision the slot configuration. These lines use feature : GPUs GPU_DISCOVERY_EXTRA = -extra conflict with these lines MACHINE_RESOURCE_GPUs = GPU_0, GPU_1, GPU_2, GPU_3 ENVIRONMENT_FOR_AssignedGPUs = GPU_NAME GPU_ID=/CUDA// I would recommend getting rid of the second set of configuration lines, but you should get rid of one of those sets for sure. -tj From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of Andrea Borsic Hi All, I have installed condor 9.12 on a Ubuntu 20.04 server using the rpm packages. The directory /etc/condor/config.d contains: /etc/condor/config.d/etc/condor/config.d/10-nes-cm-submit-execute-node.config (default file) use security:recommended_v9_0 /etc/condor/config.d/10-nes-cm-submit-execute-node.config (created by me) use ROLE : centralmanager use ROLE : submit use ROLE : execute CONDOR_HOST = 192.168.10.160 CONDOR_COLLECTOR = $(CONDOR_HOST) /etc/condor/config.d/20-local-hardware.config (created by me) use feature : GPUs GPU_DISCOVERY_EXTRA = -extra NUM_CPUS = 20 MACHINE_RESOURCE_GPUs = GPU_0, GPU_1, GPU_2, GPU_3 ENVIRONMENT_FOR_AssignedGPUs = GPU_NAME GPU_ID=/CUDA// NUM_SLOTS = 1 NUM_SLOTS_TYPE_1 = 1 SLOT_TYPE_1 = cpus=100% SLOT_TYPE_1_PARTITIONABLE = true /var/log/condor/MasterLog indicates that the three files above are considered to determine the overall configuration. The file 20-local-hardware.config was used on a previous condor 8.8 configuration.
At this time, if I type “condor_config” I get no output on screen. All the expected processes are running. Does anyone have any tip regarding the why no slot / node information is appearing with condor_status? Is there any particular log file that might indicate problems with the slot and GPU resources definitions? I have looked at the files
under /var/log/condor but I wasn’t able to find any clue regarding why the system seems not configured properly. Thanks for any advice, Best Regards, Andrea |