Miron,
I would recommend to go with option #2 with the understanding
that you need to decide whether step 3 of DAG number n will
submit DAG number n+1 as an independent HTCondor job or whether
it will create a "nested" DAG so that all jobs will be part of
one BIG DAG.
You will also have to keep in mind when the a DAGMan job is
restarted as it will play back all the nodes including the nodes
that interact with the database.
To me it sound like explaining something in English using Ndebele words
from Bullawayo. Why not web interface? Why not DRMAA? Why DAGman? If you
"recommend", you kill the discussion and few people will dare to
contradict you.
M
On Mon, Jun 23, 2014 at 3:42 PM, Miron Livny <miron@xxxxxxxxxxx
<mailto:miron@xxxxxxxxxxx>> wrote:
Nick,
I would recommend to go with option #2 with the understanding that
you need to decide whether step 3 of DAG number n will submit DAG
number n+1 as an independent HTCondor job or whether it will create
a "nested" DAG so that all jobs will be part of one BIG DAG.
You will also have to keep in mind when the a DAGMan job is
restarted as it will play back all the nodes including the nodes
that interact with the database.
Miron
On 6/22/2014 10:07 AM, Nick Cooper wrote:
Hi All,
I am currently looking at migrating from our home grown distributed
computing software to HTCondor. Over the years, user have created
complex "job managers" written in C++ which are equivalent to
application specific DAGMan scripts. To reduce the burden on users
migrating to HTCondor we would like to provide an adaptor
between a "job
manager" and HTCondor.
An example of a simple "Job Manager" is one which (all within
the same
cluster):
1. Requests 1000 simulation jobs to be executed
2. When all 1000 simulation jobs are completed, creates a
database and
loads the results into it
3. Does analysis on the results in the database and based on the
analysis requests further simulation jobs to be executed. All
without
any user involvement.
From what I have read our options are:
1. Web Service: Write an adapter using the SOAP interface. I suspect
there is not enough feedback regarding when a job completes / fails.
2. DAGMan: Write an adapter that generates DAGMan scripts.
3. DRMAA: Write an adapter that submits and monitors jobs via
the DRMAA API.
Can someone confirm if I am one the correct track?
Does anyone have any suggestions / words of wisdom for this kind of
requirement?
Further info:
- Windows based pool
- Job manager is a C++ DLL
- Looking at using the current stable release of HTCondor
- Jobs will run in the Vanilla Universe
- Jobs will need to be run under the submitters Active Directory
credentials
Thanks Nick
_________________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxxxx
<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/__mailman/listinfo/htcondor-__users
<https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
The archives can be found at:
https://lists.cs.wisc.edu/__archive/htcondor-users/
<https://lists.cs.wisc.edu/archive/htcondor-users/>
_________________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to
htcondor-users-request@xxxxxxxxxxxxx
<mailto:htcondor-users-request@xxxxxxxxxxx> with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/__mailman/listinfo/htcondor-__users
<https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
The archives can be found at:
https://lists.cs.wisc.edu/__archive/htcondor-users/
<https://lists.cs.wisc.edu/archive/htcondor-users/>
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/