Hello,
in a pool of HTCondor, version 23.0.21
,----
| $ condor_q --version
| $CondorVersion: 23.0.21 2025-03-19 $
| $CondorPlatform: X86_64-Ubuntu_22.04 $
`----
I was trying to limit the maximum number of concurrent output files
downloads by using the variable:
,----
| MAX_CONCURRENT_DOWNLOADS (https://urldefense.com/v3/__https://htcondor.readthedocs.io/en/23.0/admin-manual/configuration-macros.html*index-139__;Iw!!Mak6IKo!JpDjwDnqJcbPwyUYR_Vg7cUA57oxpGqmXzT8tuMzHFdlakBFC2PvJtoE1iMJN07UA2LWiJ4HvRD43PB-8bBzXfV_IJRI$ )
|
| This specifies the maximum number of simultaneous transfers of
| output files from execute machines to the access point. The limit
| applies to all jobs submitted from the same condor_schedd. The
| default is 100. A setting of 0 means unlimited transfers. This limit
| currently does not apply to grid universe jobs, and it also does not
| apply to streaming output files. When the limit is reached,
| additional transfers will queue up and wait before proceeding.
`----
Despite setting it to 3, I can see many jobs transferring output files
(in the example below 10 jobs):
,----
| $ condor_config_val -dump | grep -i download
| MAX_CONCURRENT_DOWNLOADS = 3
|
| $ condor_q -nobatch
|
|
| -- Schedd: xxxxx.es : <161.72.216.45:9618?... @ 08/18/25 13:04:42
| ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
| 1427.0 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test14270.dat
| 1427.1 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test14271.dat
| 1427.2 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test14272.dat
| 1427.3 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test14273.dat
| 1427.4 xxx 8/18 12:58 0+00:06:12 q> 0 0.0 fallocate -l 10G test14274.dat
| 1427.6 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test14276.dat
| 1427.7 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test14277.dat
| 1427.8 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test14278.dat
| 1427.9 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test14279.dat
| 1427.10 xxx 8/18 12:58 0+00:06:12 q> 0 0.0 fallocate -l 10G test142710.dat
| 1427.11 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test142711.dat
| 1427.12 xxx 8/18 12:58 0+00:06:12 q> 0 0.0 fallocate -l 10G test142712.dat
| 1427.13 xxx 8/18 12:58 0+00:06:12 q> 0 0.0 fallocate -l 10G test142713.dat
| 1427.14 xxx 8/18 12:58 0+00:06:12 > 0 0.0 fallocate -l 10G test142714.dat
| 1427.15 xxx 8/18 12:58 0+00:06:12 q> 0 0.0 fallocate -l 10G test142715.dat
| 1427.16 xxx 8/18 12:58 0+00:06:12 q> 0 0.0 fallocate -l 10G test142716.dat
`----
Any idea why this is happening?
(also, once the transfer starts, I would expect that removing the jobs
via "condor_rm" would stop the transfer and terminate the job, but this
doesn't happen and the output file transfer continues regardless, until
the transfer is finished and then later deleted).
Cheers,