[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Resubmitting failed job to a different site



Hi Jeff,

We (OSG staff) *strongly* discourage users from avoiding specifically named sites because:

1. if a site is persistently broken for certain types of jobs, we want to know about it so we can get it fixed
2. user jobs are likely going to keep avoiding the site even after the site gets fixed (because they won't know that the site has gotten fixed)

instead we'd like to figure out what property of an execute point is causing the failure, and tell the user to match based on that property.

We only tell a user to avoid a site to "stop the bleeding" in case it's very broken and, in that case, we tell the user to stop avoiding it when the site gets fixed.


That being said, to avoid a site in the OSPool (our preferred term for the pool of EPs running OSG VO jobs), add the site name (the GLIDEIN_Site attribute of the EPs on the site) to the job attribute "UNDESIRED_Sites".  This is a StringList (comma-or-space-separated list) that the START expressions of the EPs look at.  This is something that is specific to OSPool pilots, not a general HTCondor or GlideinWMS feature but it can be added in the Glidein Frontend config.


-Mat (OSG Software)


On 2/8/2023 4:52 AM, Jeff Templon wrote:
Hi,

A user just asked me this question:

Using OSG sites, some jobs fail, if I just resubmit them they usually are retried at the same site and fail again.  Is there a way to ask to retry a job and explicitly avoid it being resubmitted to the same location?

Ps Not sure if itâs relevant, but theyâre using PEGASUS.  He showed me a pegasus config file which was full of GLIDEIN directives.

Thanks!


JT


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/