Hi Romain,
I made it in the past.
I remember that sometime jobs didn't start and I didn't find a reason for that.
Please give it a try and let me know if it's work for you.
StartMemory = 1024
PlusMemory = 4096
request_memory = ifthenelse(((LastHoldReasonCode != 34) || (MemoryProvisioned != $(PlusMemory)) || IsUndefined(MemoryProvisioned)),$(StartMemory),$(PlusMemory))
periodic_release = (JobStatus ==5) && (HoldReasonCode == 34) && (MemoryProvisioned == StartMemory)
Thanks
David
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of romain.bouquet04@xxxxxxxxx <romain.bouquet04@xxxxxxxxx>
Sent: 03 March 2022 13:20
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] HTCondor: Increase requested RAM memory if a job is retried
HI again Gianmauro,
Thanks I don't think for my jobs that run for a long time it would be a "solution" as I don't want a cron process to run in parallel.
But thanks a lot anyway for your answers! It is much appreciated to propose that solution.
Best,
Romain
I have a cron job that run the script every 5 minutes.
It works fine for us.
Gianmauro
On 3/3/22 11:01, romain.bouquet04@xxxxxxxxx wrote:
> Hi Gianmauro,
>
> Thanks for your answer but from what I understand you launch this script
> manually right ?
> What I would like is finding a way for condor to increase the memory
> itself as my jobs are retried automatically.
>
> Best,
> Romain
>
> Le mer. 2 mars 2022 à 20:12, <gmauro@xxxxxxxxxxxxxxxxxxxxxxxxxx
> <mailto:gmauro@xxxxxxxxxxxxxxxxxxxxxxxxxx>> a écrit :
>
> Hi Roman,
>
> I use this script for exactly the purpose you described
> It will relaunch the script with 3 times the memory requested until it
> reach a cap.
> Every relaunch is recorded in a log file.
>
> $ cat /usr/bin/htcondor-release-held-jobs
>
> #!/bin/bash
> CAP=524288 # 512GB
> MULTIPLIER=3
> LOG=/data/dnb01/maintenance/condor_rerun_held_jobs.log
>
> if [ ! -f "$LOG" ]; then
> touch "$LOG"
> echo "Created $LOG"
> fi
>
> for j in $(condor_q -hold -autoformat ClusterId HoldReasonCode| awk
> '(($2-34) == 0){print $1}'| paste -s -d ' ')
> do
> JOB_DESCRIPTION=$(condor_q "$j" -autoformat JobDescription)
> MEMORY_PROVISIONED=$(condor_q "$j" -autoformat MemoryProvisioned)
>
> if [ $(($MEMORY_PROVISIONED * $MULTIPLIER)) -gt $CAP ]; then
> REQUEST_MEMORY=$CAP
> else
> REQUEST_MEMORY=$(($MEMORY_PROVISIONED * $MULTIPLIER))
> fi
> REMOTE_HOST=$(condor_q "$j" -autoformat LastRemoteHost|cut -f2
> -d@|cut -f1 -d.)
>
> DATE_WITH_TIME=$(date "+%d/%m/%Y-%H:%M:%S")
> /bin/cat <<EOM >>$LOG
> $DATE_WITH_TIME, rerunning held job, id $j, description
> $JOB_DESCRIPTION, memory_provisioned $MEMORY_PROVISIONED,
> request_memory
> $REQUEST_MEMORY, $REMOTE_HOST
> EOM
>
> condor_qedit "$j" RequestMemory=$REQUEST_MEMORY
> condor_release "$j"
> done
>
> Hope it helps,
> Gianmauro
>
>
> On 3/2/22 19:48,
romain.bouquet04@xxxxxxxxx
> <mailto:romain.bouquet04@xxxxxxxxx> wrote:
> > Dear all,
> >
> > I have jobs that I set to be retried automatically by condor in
> case of
> > failure.
> > I was wondering if there is a way for condor to automatically
> increase
> > the requested RAM for a job in case it failed and it is retried.
> >
> > I was looking at the NumJobStarts which counts the number of
> times a job
> > is started
> >
> https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html
> <https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html>
>
> >
> <https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html
> <https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html>>||
> >
> > And I was trying to add something as below in the submit file
> (but it
> > does not work):
> > (based on
> >
> https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file
> <https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file>
>
> >
> <https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file
> <https://htcondor.readthedocs.io/en/latest/users-manual/submitting-a-job.html#using-conditionals-in-the-submit-description-file>>)
>
> >
> >
> > if NumJobStarts == 0
> > request_memory = 2GB
> > else
> > request_memory = 8GB
> > endif
> >
> > I could use requirement with a syntax like
> > requirement = (NumJobStarts == 0 && TARGET.Memory >= 2GB) ||
> > (NumJobStarts >= 1 && TARGET.Memory >= 8GB)
> > But apparently it is not recommended to request memory that way
> >
> > Would anyone have a better solution?
> >
> > Many thanks in advance
> > Best,
> > Romain Bouquet
> > ||
> >
> > _______________________________________________
> > HTCondor-users mailing list
> > To unsubscribe, send a message to
> htcondor-users-request@xxxxxxxxxxx
> <mailto:htcondor-users-request@xxxxxxxxxxx> with a
> > subject: Unsubscribe
> > You can also unsubscribe by visiting
> >
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
> >
> > The archives can be found at:
> >
https://lists.cs.wisc.edu/archive/htcondor-users/
> <https://lists.cs.wisc.edu/archive/htcondor-users/>
>
> --
> Gianmauro Cuccuru
>
> UseGalaxy.eu
> Bioinformatics Group
> Department of Computer Science
> Albert-Ludwigs-University Freiburg
> Georges-Köhler-Allee 106
> 79110 Freiburg, Germany
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx
> <mailto:htcondor-users-request@xxxxxxxxxxx> with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
> <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
> <https://lists.cs.wisc.edu/archive/htcondor-users/>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
>
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
>
https://lists.cs.wisc.edu/archive/htcondor-users/
--
Gianmauro Cuccuru
UseGalaxy.eu
Bioinformatics Group
Department of Computer Science
Albert-Ludwigs-University Freiburg
Georges-Köhler-Allee 106
79110 Freiburg, Germany
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/
H
|