Re: [HTCondor-devel] [CondorLIGO] Script for optimizing .dag files with join nodes


Date: Thu, 06 Jun 2019 12:00:19 -0500 (CDT)
From: Carl Edquist <edquist@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] [CondorLIGO] Script for optimizing .dag files with join nodes
I didn't even know that condor-devel existed? Is this something I should be watching?

No idea! It just seemed like the most appropriate list for this kind of follow-up discussion.

Yeah it makes sense as a band-aid for LIGO's bleeding. Anything more general purpose i imagine you'd want slightly more careful parsing of the dag files.

I have several comments/suggestions about the the script, if you're keen on getting general python pointers. But if you like it as is i'll leave you alone :)

in an alternate universe,

You really do tempt me...

Carl

On Wed, 5 Jun 2019, Mark Coatsworth wrote:

Hi Carl, I didn't even know that condor-devel existed? Is this something I
should be watching?
Anyway, this script is just meant as a stopgap measure to solve LIGO's
immediate pain. My next job is to bake this functionality into DAGMan's
parser. If LIGO wants to complicate things with insane node names, that's
not my concern.

As for python bindings to dagman: in an alternate universe, I actually have
time to work on these, and everything is really lovely :)

Mark

On Wed, Jun 5, 2019 at 5:43 PM Carl Edquist <edquist@xxxxxxxxxxx> wrote:
      Hiya Mark,

      What a fun ticket!


      So i peeked at your script:

             https://htcondor-wiki.cs.wisc.edu/index.cgi/attach_get/1027/add-dagman-join
      -nodes.py


      And i found this fragment to be "disconcerting yet provocative":

           if "PARENT" in line:
               parent_nodes = line[0:line.index("CHILD")-1]
               child_nodes = line[line.index("CHILD"):len(line)]
               num_parents = parent_nodes.count(" ")
               num_children = child_nodes.count(" ")


      in that it is gleefully inviting to abusively craft valid .dag
      files which
      your script might mis-parse.

      Eg, what happens if job names contain "PARENT" or "CHILD" as a
      substring?
      What happens if tokens are whitespace-separated with more than
      one space?
      Probably etc.

      :mischievous_grin:


      ... Of course, this further impresses on me how useful it would
      be to have
      python access (bindings?) to dag internals, so that you could do
      the work
      that your script does without having to hand parse the actual
      text of the
      dag file.


      Carl





      On Wed, 5 Jun 2019, Mark Coatsworth wrote:

      > Hi all, I just posted my script which optimizes .dag files by
      replacing dense many-PARENT-many-CHILD connections with join
      nodes. In the case of very large, dense dags like what Chad is
      using,
      > this results in multiple orders of magnitude improvement in
      memory footprint, execution speed and job submission rate.
      > It's attached to the ticket in gittrac: 
      > https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7054
      >
      > Usage is very straightforward:
      >
      > ./add-dagman-join-nodes.py <input-dag-file> <output-dag-file>
      >
      > Please ask Chad to try using this at earliest convenience.
      We'd like to understand how much of an impact it makes in real
      production workflows.
      >
      > Mark
      >
      > --
      > Mark Coatsworth
      > Systems Programmer
      > Center for High Throughput Computing
      > Department of Computer Sciences
      > University of Wisconsin-Madison
      >
      >



--
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison

[← Prev in Thread] Current Thread [Next in Thread→]