Re: [HTCondor-devel] [CondorLIGO] Script for optimizing .dag files with join nodes


Date: Thu, 6 Jun 2019 00:04:03 +0000
From: Mark Coatsworth <coatsworth@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] [CondorLIGO] Script for optimizing .dag files with join nodes
Hi Carl, I didn't even know that condor-devel existed? Is this something I should be watching?

Anyway, this script is just meant as a stopgap measure to solve LIGO's immediate pain. My next job is to bake this functionality into DAGMan's parser. If LIGO wants to complicate things with insane node names, that's not my concern.

As for python bindings to dagman: in an alternate universe, I actually have time to work on these, and everything is really lovely :)

Mark

On Wed, Jun 5, 2019 at 5:43 PM Carl Edquist <edquist@xxxxxxxxxxx> wrote:
Hiya Mark,

What a fun ticket!


So i peeked at your script:

        https://htcondor-wiki.cs.wisc.edu/index.cgi/attach_get/1027/add-dagman-join-nodes.py


And i found this fragment to be "disconcerting yet provocative":

     if "PARENT" in line:
         parent_nodes = line[0:line.index("CHILD")-1]
         child_nodes = line[line.index("CHILD"):len(line)]
         num_parents = parent_nodes.count(" ")
         num_children = child_nodes.count(" ")


in that it is gleefully inviting to abusively craft valid .dag files which
your script might mis-parse.

Eg, what happens if job names contain "PARENT" or "CHILD" as a substring?
What happens if tokens are whitespace-separated with more than one space?
Probably etc.

:mischievous_grin:


... Of course, this further impresses on me how useful it would be to have
python access (bindings?) to dag internals, so that you could do the work
that your script does without having to hand parse the actual text of the
dag file.


Carl





On Wed, 5 Jun 2019, Mark Coatsworth wrote:

> Hi all, I just posted my script which optimizes .dag files by replacing dense many-PARENT-many-CHILD connections with join nodes. In the case of very large, dense dags like what Chad is using,
> this results in multiple orders of magnitude improvement in memory footprint, execution speed and job submission rate.
> It's attached to the ticket in gittrac: 
> https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7054
>
> Usage is very straightforward:
>
> ./add-dagman-join-nodes.py <input-dag-file> <output-dag-file>
>
> Please ask Chad to try using this at earliest convenience. We'd like to understand how much of an impact it makes in real production workflows.
>
> Mark
>
> --
> Mark Coatsworth
> Systems Programmer
> Center for High Throughput Computing
> Department of Computer Sciences
> University of Wisconsin-Madison
>
>


--
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison
[← Prev in Thread] Current Thread [Next in Thread→]