Hi Sophie, The catch here is that there is no system_periodic_remove_reason, only the hold_reason. I would recommend changing your periodic remove to a system_periodic_hold, and then set up the periodic remove to only deal with removing held jobs older than a certain age. When a job is held the user will get a notification of a periodic hold if the notification attribute is set to Always or Error. Everything else looks good – nested ifThenElse() statements should be demonstrated in the manual, they’re quite handy aren’t they? Michael V. Pelletier From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx>
On Behalf Of FERRY Sophie Hello all, On a test cluster with 3 vm (CentOS Linux release 7.6.1810 ), with one scheduler (ARC-CE), one manager and one worker.
$CondorVersion: 8.6.12 Jul 31 2018 BuildID: 446077 I have defined
SYSTEM_PERIODIC_REMOVE and SYSTEM_PERIODIC_REMOVE_REASON, and MAX_TRANSFER_OUTPUT_MB = 50 (see below). Running a very simple job (dd /dev/zero for 100M), the job is put to hold state at the time the transfer exceeds the 50M. I understand that after “JobStatus == 5 && ( CurrentTime - EnteredCurrentStatus ) > 10 * 60” (ie 10 min) the job is removed. But then, the final error return to the user is usually ( because I happened to be something else sometime for the same test): State: Failed Job Error: LRMS error: (-1) RemoveReason: The system macro SYSTEM_PERIODIC_REMOVE _expression_ '( RemoteWallClockTime > 80 * 60 * 60 ) || ( RemoteSysCpu + RemoteUserCpu > 10 * 60 * 60 ) || ( ( JobStatus == 5 && ( CurrentTime - EnteredCurrentStatus
) > 10 * 60 ) ) || ( ResidentSetSize_RAW > 1000 * RequestMemory ) || ( JobRunCount > 10 )' evaluated to TRUE è
I’d like to have condor return the actual
SYSTEM_PERIODIC_REMOVE_REASON rather than” SYSTEM_PERIODIC_REMOVE evaluated to true”.
How can I defined that ? Many thanks Sophie << ## Time limits Sophie Ferry
CEA Saclay 91191 Gif-Sur-Yvette DRF/IRFU/DEDIP/LIS GRIF-IRFU Bat.141, p.023B +33(0)1 69 08 76 45 |