[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Application specific scheduler

Date: Mon, 30 Jun 2014 11:25:47 -0400
From: Tevfikkosar <tkosar@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Application specific scheduler

And, to add what Ken just said, even in the case of a failure wherethe user would need some manual action to fix the problem, the entireworkflow still does not need to be restarted. DAGMan would create a"rescue DAG", marking already completed jobs as "DONE", and would onlyrerun/retry unfinished jobs in the workflow. Another feature with 10+years of history...


Tevfik Kosar
A Condor Alumnus

-- Sent from a mobile phone.

On Jun 30, 2014, at 10:20, "R. Kent Wenger" <wenger@xxxxxxxxxxx> wrote:

On Sat, 28 Jun 2014, Miha Ahronovitz wrote:
So Nick, says, I want to migrate my home grown distributedenvironment toHTCondor. As a new user he considers 3 options. Miron says useDAGman. Mihaasks why. Miron says because it manages job dependencies. GabrielsaysDAGman is the way to go, but he wonders "why, in case of failure,one
has to restart the workflow rather than retry the failed jobs, "
Kent Wegner from CHTC team clarifies ans says, yes we know it isproblem,
gives the link and has a name for it: this is issue #2831.
Let me stop here. Nick seems an an experienced sysadmin /engineer. ButHTCondor-list has 2,100 subscribers. How many of these subscribersknowabout DAGman? Maybe they search and read why, in case of failure,they hae
resubmitt all jobs from the beginning?
Just to clarify, I was assuming (perhaps incorrectly) that Gabrielwas referring to the case where the user has to take some kind ofmanual action to fix the problem with a job that failed, beforeretrying that job.
If a job fails, but it may succeed on being retried without anyaction from the user, the retry option in DAGMan can handle thatcase. The retry option for nodes in DAGMan has existed for a longtime (10+ years, I think), so hopefully many people are aware ofthat...
Kent
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxxwith a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Follow-Ups:
- Re: [HTCondor-users] Application specific scheduler
  - From: Miha Ahronovitz

References:
- [HTCondor-users] Application specific scheduler
  - From: Nick Cooper
- Re: [HTCondor-users] Application specific scheduler
  - From: Miron Livny
- Re: [HTCondor-users] Application specific scheduler
  - From: Miha Ahronovitz
- Re: [HTCondor-users] Application specific scheduler
  - From: Miron Livny
- Re: [HTCondor-users] Application specific scheduler
  - From: Gabriel Mateescu
- Re: [HTCondor-users] Application specific scheduler
  - From: R. Kent Wenger
- Re: [HTCondor-users] Application specific scheduler
  - From: Miha Ahronovitz
- Re: [HTCondor-users] Application specific scheduler
  - From: R. Kent Wenger

Prev by Date: Re: [HTCondor-users] Application specific scheduler
Next by Date: Re: [HTCondor-users] [Condor-users] Job Resubmit
Previous by thread: Re: [HTCondor-users] Application specific scheduler
Next by thread: Re: [HTCondor-users] Application specific scheduler
Index(es):
- Date
- Thread