Hi all, has somebody experiences using the cgroup blkio controller to limit a job's I/O to the disk? Background is, that a user recently send a task whose jobs were doing primarily merging, i.e., heavily churning on the local disk with r/w. When nodes got 'too many' jobs of this type, they became somewhat stuck in I/O wait. So I have been thinking, if the condor cgroups' blkio controller could be tuned limiting each job's I/O and not to waste too many cycles in IO wait and to protect other jobs? As far as I see, condor cgroups have all no throttling limits set and have each subgroup has the default weighting. Would it be feasible in a first step to set some upper limits for the parent group .../condor.service/blkio.throttle.* - let's say taking the I/O rates from a small benchmark (bps and/or iops?) and add some safety margin. Due to the same weighting this might be not the 'fairest' solution (would be scaling bps/iops by the number of cores actually a reasonable assumption if cores are the basic commodity??) Maybe somebody has some suggestions or experiences in this direction? Cheers and thanks, Thomas
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature