[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Problem in running parallel program



Hi

On 22.09.21 14:35, Rajagopala Reddy Seelam wrote:
Response to this email: No, I think "dagman" may not help me here. This has to do with the "request_cpus=1". HTCondor accepts jobs upto 20 and immediately runs these 20 calculations. As a result, the memory is exhausted and the machine hangs. I am looking to the "hold" possibility to manually specify the scheduler to hold the job and release the job after the earlier job is completed.


I think the partition-able slot will help here as well as you can also can simply use

request_memory = 6G
request_cpus = 5

and if the machine has 20 cores and 16 GByte of RAM, it would only ever run two of these at the same time as condor only as 4 GByte and 10 CPU cores left for a new job.

There are many more knobs to try to achieve this, but these would be the ones I would
try first.

Cheers

Carsten

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature