Hi Mat, many thanks for the job transform!That's what we are effectively aiming for - effectively to give our users a helping hand (transfer plugin + job handling) for their files' tape recall states (and save nerves on our admin side ;) )
Cheers and thanks from Hamburg, Thomas On 15/11/2022 19.40, MÃtyÃs Selmeci via HTCondor-users wrote:
Hi Thomas,If you want ideas, here is the job transform that we use to add the necessary attributes to any job that uses "stash://" URLs:--- # These are the default values; they can be overridden later. StashRetry_MaxRetries = 3 # Delay at least 300 seconds ... StashRetry_MinimumDelay = 300 # ... and up to 300 more seconds, before retrying. StashRetry_RandomDelay = 300 JOB_TRANSFORM_StashRetry @=jt  REQUIREMENTS regexp("stash://", TransferInput) # https://htcondor.readthedocs.io/en/latest/classad-attributes/job-classad-attributes.html#HoldReasonCode transfer_input_error_code = 13  # FT plugin error codes are left-shifted by 8.  # stash plugin uses 11 for retriable failures.  EVALMACRO transfer_input_error_subcode_retriable = 11 << 8 EVALMACRO retry_delay = $(StashRetry_MinimumDelay) + random($(StashRetry_RandomDelay)) SET StashRetryCondition \  ( HoldReasonCode == $(transfer_input_error_code) && \ HoldReasonSubCode == $(transfer_input_error_subcode_retriable) && \ NumHoldsByReason.TransferInputError > 0 && \ NumHoldsByReason.TransferInputError <= $(StashRetry_MaxRetries) \ ) ?: false  SET StashRetryTime EnteredCurrentStatus + $(retry_delay) SET PeriodicRelease ($(MY.PeriodicRelease:false)) || (StashRetryCondition && (time() > StashRetryTime))@jt --- -Mat On 11/15/2022 10:24 AM, Thomas Hartmann wrote:Hi Mat, that sounds good :D Something like that is what we envisage - where the jobs get released occasionally and put back on hold until all their files have been staged from tape for good. Cheers and thanks,  Thomas On 15/11/2022 16.12, MÃtyÃs Selmeci via HTCondor-users wrote:Not strict at all -- OSG uses 11 in one of our plugins to indicate a "retryable" failure. Any nonzero exit code results in a hold with the HoldReasonSubCode being the exit code left shifted by 8 (so multiplied by 256). We have a PeriodicRelease that retries the job after a random delay in case it was one of these failures. -Mat On 11/15/2022 8:25 AM, Thomas Hartmann wrote:Hi all, quick question on transfer plugins - how strict is the constraint on exit codes 0,1,2? According to https://htcondor.readthedocs.io/en/latest/admin-manual/setting-up-special-environments.html#enabling-the-transfer-of-files-specified-by-a-url these three exit codes are the (only?) expected ones by Condor. Potentially, I would like to distinguish between a few fail reasons, e.g., if a file is not present vs a file only nearline. So that one could send a job back into hold and maybe release it later on if a file was nearline but not release it, if not found in the namespace. I.e., evaluating `HoldReasonSubCode` occasionally. Cheers,  Thomas _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/_______________________________________________ HTCondor-users mailing listTo unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with asubject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ _______________________________________________ HTCondor-users mailing listTo unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with asubject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/_______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature