[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[HTCondor-users] MAX_CONCURRENT_DOWNLOADS not working?



Hello,

in a pool of HTCondor, version 23.0.21

,----
| $ condor_q --version
| $CondorVersion: 23.0.21 2025-03-19 $
| $CondorPlatform: X86_64-Ubuntu_22.04 $
`----

I was trying to limit the maximum number of concurrent output files
downloads by using the variable:

,----
| MAX_CONCURRENT_DOWNLOADS (https://htcondor.readthedocs.io/en/23.0/admin-manual/configuration-macros.html#index-139)
| 
|     This specifies the maximum number of simultaneous transfers of
|     output files from execute machines to the access point. The limit
|     applies to all jobs submitted from the same condor_schedd. The
|     default is 100. A setting of 0 means unlimited transfers. This limit
|     currently does not apply to grid universe jobs, and it also does not
|     apply to streaming output files. When the limit is reached,
|     additional transfers will queue up and wait before proceeding. 
`----

Despite setting it to 3, I can see many jobs transferring output files
(in the example below 10 jobs):

,----
| $ condor_config_val -dump | grep -i download
| MAX_CONCURRENT_DOWNLOADS = 3
| 
| $ condor_q -nobatch
| 
| 
| -- Schedd: xxxxx.es : <161.72.216.45:9618?... @ 08/18/25 13:04:42
|  ID      OWNER            SUBMITTED     RUN_TIME ST PRI SIZE CMD
| 1427.0   xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test14270.dat
| 1427.1   xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test14271.dat
| 1427.2   xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test14272.dat
| 1427.3   xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test14273.dat
| 1427.4   xxx          8/18 12:58   0+00:06:12 q> 0    0.0 fallocate -l 10G test14274.dat
| 1427.6   xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test14276.dat
| 1427.7   xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test14277.dat
| 1427.8   xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test14278.dat
| 1427.9   xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test14279.dat
| 1427.10  xxx          8/18 12:58   0+00:06:12 q> 0    0.0 fallocate -l 10G test142710.dat
| 1427.11  xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test142711.dat
| 1427.12  xxx          8/18 12:58   0+00:06:12 q> 0    0.0 fallocate -l 10G test142712.dat
| 1427.13  xxx          8/18 12:58   0+00:06:12 q> 0    0.0 fallocate -l 10G test142713.dat
| 1427.14  xxx          8/18 12:58   0+00:06:12  > 0    0.0 fallocate -l 10G test142714.dat
| 1427.15  xxx          8/18 12:58   0+00:06:12 q> 0    0.0 fallocate -l 10G test142715.dat
| 1427.16  xxx          8/18 12:58   0+00:06:12 q> 0    0.0 fallocate -l 10G test142716.dat
`----

Any idea why this is happening?


(also, once the transfer starts, I would expect that removing the jobs
via "condor_rm" would stop the transfer and terminate the job, but this
doesn't happen and the output file transfer continues regardless, until
the transfer is finished and then later deleted).

Cheers,
-- 
Ãngel de Vicente  
 Research Software Engineer (Supercomputing and BigData)
 Instituto de AstrofÃsica de Canarias (https://www.iac.es/en)