Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] HTCondor: Increase requested RAM memory if a job is retried
- Date: Wed, 02 Mar 2022 20:11:37 +0100
- From: gmauro@xxxxxxxxxxxxxxxxxxxxxxxxxx
- Subject: Re: [HTCondor-users] HTCondor: Increase requested RAM memory if a job is retried
Hi Roman,
I use this script for exactly the purpose you described
It will relaunch the script with 3 times the memory requested until it
reach a cap.
Every relaunch is recorded in a log file.
$ cat /usr/bin/htcondor-release-held-jobs
#!/bin/bash
CAP=524288 # 512GB
MULTIPLIER=3
LOG=/data/dnb01/maintenance/condor_rerun_held_jobs.log
if [ ! -f "$LOG" ]; then
touch "$LOG"
echo "Created $LOG"
fi
for j in $(condor_q -hold -autoformat ClusterId HoldReasonCode| awk
'(($2-34) == 0){print $1}'| paste -s -d ' ')
do
JOB_DESCRIPTION=$(condor_q "$j" -autoformat JobDescription)
MEMORY_PROVISIONED=$(condor_q "$j" -autoformat MemoryProvisioned)
if [ $(($MEMORY_PROVISIONED * $MULTIPLIER)) -gt $CAP ]; then
REQUEST_MEMORY=$CAP
else
REQUEST_MEMORY=$(($MEMORY_PROVISIONED * $MULTIPLIER))
fi
REMOTE_HOST=$(condor_q "$j" -autoformat LastRemoteHost|cut -f2
-d@|cut -f1 -d.)
DATE_WITH_TIME=$(date "+%d/%m/%Y-%H:%M:%S")
/bin/cat <<EOM >>$LOG
$DATE_WITH_TIME, rerunning held job, id $j, description
$JOB_DESCRIPTION, memory_provisioned $MEMORY_PROVISIONED, request_memory
$REQUEST_MEMORY, $REMOTE_HOST
EOM
condor_qedit "$j" RequestMemory=$REQUEST_MEMORY
condor_release "$j"
done
Hope it helps,
Gianmauro
On 3/2/22 19:48, romain.bouquet04@xxxxxxxxx wrote:
Dear all,
I have jobs that I set to be retried automatically by condor in case of
failure.
I was wondering if there is a way for condor to automatically increase
the requested RAM for a job in case it failed and it is retried.
I was looking at the NumJobStarts which counts the number of times a job
is started
https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html
<https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html>||
And I was trying to add something as below in the submit file (but it
does not work):
(based on
https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file
<https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file>)
if NumJobStarts == 0
ÂÂ request_memory = 2GB
else
 request_memory = 8GB
endif
I could use requirement with a syntax like
requirement = (NumJobStarts == 0 &&ÂTARGET.Memory >= 2GB) ||
(NumJobStarts >= 1 &&ÂTARGET.Memory >= 8GB)
But apparently it is not recommended to request memory that way
Would anyone have a better solution?
Many thanks in advance
Best,
Romain Bouquet
||
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Gianmauro Cuccuru
UseGalaxy.eu
Bioinformatics Group
Department of Computer Science
Albert-Ludwigs-University Freiburg
Georges-KÃhler-Allee 106
79110 Freiburg, Germany