I canât think of anything that would normally cause a periodic hold _expression_ to stop working.
Here are a couple of ideas for debugging the problemâ
When thereâs a job in the queue that you think should be affected by the periodic hold _expression_, try running this command:
condor_q -all -nobatch -constraint `condor_config_val SYSTEM_PERIODIC_HOLD`
If that doesnât display the problematic job(s), try altering the _expression_ (removing or adjusting terms) to see whatâs needed to make the jobs appear. That can reveal differences between what youâre checking for and whatâs in the job ads.
To ensure the schedd is evaluating the periodic job expressions on a timely basis, you can try amending the _expression_ to always hold special test jobs. For example, you can add this to the end of your config files:
SYSTEM_PERIODIC_HOLD = ($SYSTEM_PERIODIC_HOLD) || AdminHoldJob=?=true
Then, submit a test job with the following line in the submit file:
+AdminHoldJob=True
Then, wait and see if the job gets held.
Â- Jaime
> On Aug 17, 2021, at 5:09 AM, David Cohen <cdavid@xxxxxxxxxxxxxxxxxxxxxx> wrote:
>
> Hi,
> A SYSTEM_PERIODIC_HOLD, configure on the schedd, that used to work is ignored lately:
>
> SYSTEM_PERIODIC_HOLD = (Time() - JobCurrentStartDate) > IfthenElse(HiMemUser && (RequestMemory > 40*1024), 120*3600 , 72*3600)
> SYSTEM_PERIODIC_HOLD_Reason = "Job Is Running over time"
> SYSTEM_PERIODIC_REMOVE = JobStatus == 5 && (Time() - EnteredCurrentStatus) > 600
>
> I could find no reference to that in the system's log.
> How can I debug that?
>
> Best,
> David
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/