I didn't even know that condor-devel existed? Is this something I should
be watching?
No idea! It just seemed like the most appropriate list for this kind of
follow-up discussion.
Yeah it makes sense as a band-aid for LIGO's bleeding. Anything more
general purpose i imagine you'd want slightly more careful parsing of the
dag files.
I have several comments/suggestions about the the script, if you're keen
on getting general python pointers. But if you like it as is i'll leave
you alone :)
in an alternate universe,
You really do tempt me...
Carl
On Wed, 5 Jun 2019, Mark Coatsworth wrote:
Hi Carl, I didn't even know that condor-devel existed? Is this something I
should be watching?
Anyway, this script is just meant as a stopgap measure to solve LIGO's
immediate pain. My next job is to bake this functionality into DAGMan's
parser. If LIGO wants to complicate things with insane node names, that's
not my concern.
As for python bindings to dagman: in an alternate universe, I actually have
time to work on these, and everything is really lovely :)
Mark
On Wed, Jun 5, 2019 at 5:43 PM Carl Edquist <edquist@xxxxxxxxxxx> wrote:
Hiya Mark,
What a fun ticket!
So i peeked at your script:
https://htcondor-wiki.cs.wisc.edu/index.cgi/attach_get/1027/add-dagman-join
-nodes.py
And i found this fragment to be "disconcerting yet provocative":
if "PARENT" in line:
parent_nodes = line[0:line.index("CHILD")-1]
child_nodes = line[line.index("CHILD"):len(line)]
num_parents = parent_nodes.count(" ")
num_children = child_nodes.count(" ")
in that it is gleefully inviting to abusively craft valid .dag
files which
your script might mis-parse.
Eg, what happens if job names contain "PARENT" or "CHILD" as a
substring?
What happens if tokens are whitespace-separated with more than
one space?
Probably etc.
:mischievous_grin:
... Of course, this further impresses on me how useful it would
be to have
python access (bindings?) to dag internals, so that you could do
the work
that your script does without having to hand parse the actual
text of the
dag file.
Carl
On Wed, 5 Jun 2019, Mark Coatsworth wrote:
> Hi all, I just posted my script which optimizes .dag files by
replacing dense many-PARENT-many-CHILD connections with join
nodes. In the case of very large, dense dags like what Chad is
using,
> this results in multiple orders of magnitude improvement in
memory footprint, execution speed and job submission rate.
> It's attached to the ticket in gittrac:
> https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7054
>
> Usage is very straightforward:
>
> ./add-dagman-join-nodes.py <input-dag-file> <output-dag-file>
>
> Please ask Chad to try using this at earliest convenience.
We'd like to understand how much of an impact it makes in real
production workflows.
>
> Mark
>
> --
> Mark Coatsworth
> Systems Programmer
> Center for High Throughput Computing
> Department of Computer Sciences
> University of Wisconsin-Madison
>
>
--
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison
|