[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTCondor: Increase requested RAM memory if a job is retried



Hi Gianmauro,

Thanks for your answer but from what I understand you launch this script manually right ?
What I would like is finding a way for condor to increase the memory itself as my jobs are retried automatically.

Best,
Romain

LeÂmer. 2 mars 2022 ÃÂ20:12, <gmauro@xxxxxxxxxxxxxxxxxxxxxxxxxx> a ÃcritÂ:
Hi Roman,

I use this script for exactly the purpose you described
It will relaunch the script with 3 times the memory requested until it
reach a cap.
Every relaunch is recorded in a log file.

$ cat /usr/bin/htcondor-release-held-jobs

#!/bin/bash
CAP=524288 # 512GB
MULTIPLIER=3
LOG=/data/dnb01/maintenance/condor_rerun_held_jobs.log

if [ ! -f "$LOG" ]; then
touch "$LOG"
echo "Created $LOG"
fi

for j in $(condor_q -hold -autoformat ClusterId HoldReasonCode| awk
'(($2-34) == 0){print $1}'| paste -s -d ' ')
do
 ÂJOB_DESCRIPTION=$(condor_q "$j" -autoformat JobDescription)
 ÂMEMORY_PROVISIONED=$(condor_q "$j" -autoformat MemoryProvisioned)

 Âif [ $(($MEMORY_PROVISIONED * $MULTIPLIER)) -gt $CAP ]; then
  ÂREQUEST_MEMORY=$CAP
 Âelse
  ÂREQUEST_MEMORY=$(($MEMORY_PROVISIONED * $MULTIPLIER))
 Âfi
 ÂREMOTE_HOST=$(condor_q "$j" -autoformat LastRemoteHost|cut -f2
-d@|cut -f1 -d.)

 ÂDATE_WITH_TIME=$(date "+%d/%m/%Y-%H:%M:%S")
 Â/bin/cat <<EOM >>$LOG
 Â$DATE_WITH_TIME, rerunning held job, id $j, description
$JOB_DESCRIPTION, memory_provisioned $MEMORY_PROVISIONED, request_memory
$REQUEST_MEMORY, $REMOTE_HOST
EOM

 Âcondor_qedit "$j" RequestMemory=$REQUEST_MEMORY
 Âcondor_release "$j"
done

Hope it helps,
Gianmauro


On 3/2/22 19:48, romain.bouquet04@xxxxxxxxx wrote:
> Dear all,
>
> I have jobs that I set to be retried automatically by condor in case of
> failure.
> I was wondering if there is a way for condor to automatically increase
> the requested RAM for a job in case it failed and it is retried.
>
> I was looking at the NumJobStarts which counts the number of times a job
> is started
> https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html
> <https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html>||
>
> And I was trying to add something as below in the submit file (but it
> does not work):
> (based on
> https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file
> <https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file>)
>
>
> if NumJobStarts == 0
>Â ÂÂ request_memory = 2GB
> else
>Â Â request_memory = 8GB
> endif
>
> I could use requirement with a syntax like
> requirement = (NumJobStarts == 0 &&ÂTARGET.Memory >= 2GB) ||
> (NumJobStarts >= 1 &&ÂTARGET.Memory >= 8GB)
> But apparently it is not recommended to request memory that way
>
> Would anyone have a better solution?
>
> Many thanks in advance
> Best,
> Romain Bouquet
> ||
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/

--
Gianmauro Cuccuru

UseGalaxy.eu
Bioinformatics Group
Department of Computer Science
Albert-Ludwigs-University Freiburg
Georges-KÃhler-Allee 106
79110 Freiburg, Germany
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/