Hi Mihai, there are two kinds of periodic remove expressions: One for the system affecting all jobs, and one for *each* job. The error message suggests this is the per-job remove expression triggering. I would recommend to check the VOsâ jobs if they have a `PERIODIC_REMOVE` attribute or similar, then try and find out where it is set. Cheers, Max > On 28. Mar 2023, at 14:17, Mihai Ciubancan <ciubancan@xxxxxxxx> wrote: > > In the last weeks a see a lot of Alice VO(and also LHCb) jobs failing with the following message: > > The job attribute PeriodicRemove expression '(JobStatus == 1 && NumJobStarts > 0) || ((ResidentSetSize =!= undefined ? ResidentSetSize : 0) > JobMemoryLimit)' evaluated to TRUE > > I have set the the remove reason as I saw in a older email in the list (few months ago): > > # SYSTEM_PERIODIC_REMOVE with reasons > ######################## > > # remove jobs running longer than 7 days > RemoveReadyJobs = (( JobStatus == 2 ) && ( ( CurrentTime - EnteredCurrentStatus ) > 7 * 24 * 3600 )) > > # remove jobs on hold for longer than 7 days > RemoveHeldJobs = ( (JobStatus==5 && (CurrentTime - EnteredCurrentStatus) > 7 * 24 * 3600) ) > > # remove jobs with to many job starts or shadow starts > RemoveMultipleRunJobs = ( NumJobStarts >= 10 ) > > # remove jobs idle for too long > MaxJobIdleTime = 7 * 24 * 3600 > RemoveIdleJobs = (( JobStatus == 1 ) && ( ( CurrentTime - EnteredCurrentStatus ) > MaxJobIdleTime )) > > # do it > SYSTEM_PERIODIC_REMOVE = $(RemoveHeldJobs) || \ > $(RemoveMultipleRunJobs) || \ > $(RemoveIdleJobs) || \ > $(RemoveReadyJobs) > > # set reason for remove > SYSTEM_PERIODIC_REMOVE_REASON = strcat("Job removed by SYSTEM_PERIODIC_REMOVE due to ", \ > ifThenElse($(RemoveReadyJobs), "runtime longer than reserved", \ > ifThenElse($(RemoveHeldJobs), "being in hold state for 7 days", \ > ifThenElse($(RemoveMultipleRunJobs), "more than 10 failed jobstarts", \ > "being in idle state for 10 days"))),".") > > I have reconfigure the master and schedd daemons, but the problem persist. > > Do you have any idea how to fix this? > > Thank you, > Mihai > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME cryptographic signature