Hi from me as well, just to re-iterate on what Henning already wrote. The test pool has only got two execution hosts with only slightly differing outputs of condor_config_val -dump[1]. After submission of many single core jobs, they occupy all available slots. Then, we submit a large number of 12 core jobs which only get scheduled to a single host, preempting multiple jobs from there but then everything becomes static: $ condor_q -- Schedd: condor3.atlas.local : <10.20.30.18:9618?... @ 11/06/19 14:59:24 OWNER BATCH_NAME SUBMITTED DONE RUN IDLE TOTAL JOB_IDS carsten ID: 151 11/6 14:51 _ 136 64 200 151.0-199 carsten ID: 153 11/6 14:51 _ 2 198 200 153.0-199 $ condor_q -run [..] 151.157 carsten 11/6 14:51 0+00:08:30 slot1@xxxxxxxxxxxxxxxxx 151.158 carsten 11/6 14:51 0+00:08:30 slot1@xxxxxxxxxxxxxxxxx 151.159 carsten 11/6 14:51 0+00:08:30 slot1@xxxxxxxxxxxxxxxxx 153.0 carsten 11/6 14:51 0+00:05:51 slot1@xxxxxxxxxxxxxxxxx 153.1 carsten 11/6 14:51 0+00:04:51 slot1@xxxxxxxxxxxxxxxxx The Negotiator log looks fine for the first two jobs: 11/06/19 14:53:54 Matched pslot slot1@xxxxxxxxxxxxxxxxx by priority preempting 12 dynamic slots 11/06/19 14:53:54 Preempting various_dSlot_users (user prio=500.00, startd rank=0.00) on slot1@xxxxxxxxxxxxxxxxx for aei.dev.admin.multi.carsten@xxxxxxx ocal (user prio=2.54, startd rank=0.00) 11/06/19 14:53:54 Sending PERMISSION, claim id, startdAd to schedd 11/06/19 14:53:54 Notifying the accountant 11/06/19 14:53:54 Successfully matched with slot1@xxxxxxxxxxxxxxxxx 11/06/19 14:53:54 Match completed, match cost= 12 11/06/19 14:53:54 Request 00153.00000: autocluster 6 (request count 2 of 200) 11/06/19 14:53:54 matchmakingAlgorithm: limit 159.999575 used 12.000000 pieLeft 147.999575 11/06/19 14:53:54 Attempting to use cached MatchList: Failed (MatchList length: 0, Autocluster: 6, Submitter Name: aei.dev.admin.multi.carsten@xxxxxxxxxxx, Sc hedd Address: <10.20.30.18:9618?addrs=10.20.30.18-9618&noUDP&sock=756250_2d21_3>) 11/06/19 14:53:54 Send END_NEGOTIATE to remote schedd 11/06/19 14:53:54 Submitter aei.dev.admin.multi.carsten@xxxxxxxxxxx got all it wants; removing it. However, later on for 153.2 looks fine initially 11/06/19 14:59:54 Socket to aei.dev.admin.multi.carsten@xxxxxxxxxxx (<10.20.30.18:9618?addrs=10.20.30.18-9618&noUDP&sock=756250_2d21_3>) already in cache, reu sing 11/06/19 14:59:54 Started NEGOTIATE with remote schedd; protocol version 1. 11/06/19 14:59:54 Request 00153.00002: autocluster 6 (request count 1 of 198) 11/06/19 14:59:54 matchmakingAlgorithm: limit 135.999583 used 0.000000 pieLeft 135.999583 evaluation for PREEMPTION_REQUIREMENT is also good. e.g. 11/06/19 14:59:54 Classad debug: 1573051884 --> 1573051884 11/06/19 14:59:54 Classad debug: [0.01597ms] JobStart --> 1573051884 11/06/19 14:59:54 Classad debug: time() --> 1573052394 11/06/19 14:59:54 Classad debug: 1573051884 --> 1573051884 11/06/19 14:59:54 Classad debug: [0.00906ms] JobStart --> 1573051884 11/06/19 14:59:54 Classad debug: [0.05794ms] ifThenElse(JobStart isnt undefined,(time() - JobStart),0) --> 510 11/06/19 14:59:54 Classad debug: [0.00119ms] RemoteUserPrio --> 995387 11/06/19 14:59:54 Classad debug: [0.00095ms] SubmittorPrio --> 2.59673 11/06/19 14:59:54 Classad debug: [0.09704ms] ifThenElse(JobStart isnt undefined,(time() - JobStart),0) > (2 * 60) && (RemoteUserPrio > SubmittorPrio * 1.200000000000000E+00) --> TRUE However, it sadly ends with: 11/06/19 14:59:54 Send END_NEGOTIATE to remote schedd 11/06/19 14:59:54 Submitter aei.dev.admin.multi.carsten@xxxxxxxxxxx got all it wants; removing it. Thus, two questions: (1) Which is hopefully simple to answer - is there a way to speed up preemption? Right now, only a single preemption occurs in each negotiation cycle. (2) Any idea, why the jobs on the second node are never preempted? Cheers Carsten [1] Differences mostly do the the fact a3001 contains a few GPUs $ diff /tmp/a30*config 1c1 < # Configuration from machine: a3001.atlas.local --- > # Configuration from machine: a3010.atlas.local 85c85 < CENTRAL_MANAGER = condorhub --- > CENTRAL_MANAGER = condorhub.atlas.local 277,280c277,280 < DETECTED_CORES = 32 < DETECTED_CPUS = 32 < DETECTED_MEMORY = 192081 < DETECTED_PHYSICAL_CPUS = 16 --- > DETECTED_CORES = 128 > DETECTED_CPUS = 128 > DETECTED_MEMORY = 515889 > DETECTED_PHYSICAL_CPUS = 64 315,316d314 < ENVIRONMENT_FOR_AssignedGPUs = CUDA_VISIBLE_DEVICES < ENVIRONMENT_VALUE_FOR_UnAssignedGPUs = none 343c341 < FULL_HOSTNAME = a3001.atlas.local --- > FULL_HOSTNAME = a3010.atlas.local 374d371 < GPU_DISCOVERY_EXTRA = -extra 442c439 < HOSTNAME = a3001 --- > HOSTNAME = a3010 458c455 < IP_ADDRESS = 10.10.30.1 --- > IP_ADDRESS = 10.10.30.10 460c457 < IPV4_ADDRESS = 10.10.30.1 --- > IPV4_ADDRESS = 10.10.30.10 538d533 < MACHINE_RESOURCE_INVENTORY_GPUs = /usr/share/condor/condor_gpu_discovery_wrapper $(LIBEXEC)/condor_gpu_discovery -properties $(GPU_DISCOVERY_EXTRA) 677c672 < NETWORK_INTERFACE = 10.10.30.1 --- > NETWORK_INTERFACE = 10.10.30.10 714c709 < PID = 5054 --- > PID = 106453 724c719 < PPID = 5046 --- > PPID = 106445 865c860 < SHUTDOWN_GRACEFUL_TIMEOUT = 16000 --- > SHUTDOWN_GRACEFUL_TIMEOUT = 600 870c865 < SLOT_TYPE_1 = ram=171857, swap=0%, cpus=100% --- > SLOT_TYPE_1 = ram=438065, swap=0%, cpus=100% 900,904c895 < STARTD_CRON_GPUs_MONITOR_EXECUTABLE = $(LIBEXEC)/condor_gpu_utilization < STARTD_CRON_GPUs_MONITOR_METRICS = SUM:GPUs, PEAK:GPUsMemory < STARTD_CRON_GPUs_MONITOR_MODE = WaitForExit < STARTD_CRON_GPUs_MONITOR_PERIOD = 1 < STARTD_CRON_JOBLIST = FACTER SIMD GPUs_MONITOR --- > STARTD_CRON_JOBLIST = FACTER SIMD 1019c1010 < USER_JOB_WRAPPER = $(LOCAL_CONDOR_SCRIPTS)/user-job-wrapper.sh --- > USER_JOB_WRAPPER = 1023c1014 < UTSNAME_NODENAME = a3001 --- > UTSNAME_NODENAME = a3010 1077,1078d1067 < # /etc/condor/config.d/20_gpu < # /etc/condor/config.d/20_preemption
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature