Re: [HTCondor-devel] [CondorLIGO] Script for optimizing .dag files with join nodes


Date: Wed, 05 Jun 2019 17:43:50 -0500 (CDT)
From: Carl Edquist <edquist@xxxxxxxxxxx>
Subject: Re: [HTCondor-devel] [CondorLIGO] Script for optimizing .dag files with join nodes
Hiya Mark,

What a fun ticket!


So i peeked at your script:

	https://htcondor-wiki.cs.wisc.edu/index.cgi/attach_get/1027/add-dagman-join-nodes.py


And i found this fragment to be "disconcerting yet provocative":

    if "PARENT" in line:
        parent_nodes = line[0:line.index("CHILD")-1]
        child_nodes = line[line.index("CHILD"):len(line)]
        num_parents = parent_nodes.count(" ")
        num_children = child_nodes.count(" ")


in that it is gleefully inviting to abusively craft valid .dag files which your script might mis-parse.

Eg, what happens if job names contain "PARENT" or "CHILD" as a substring? What happens if tokens are whitespace-separated with more than one space? Probably etc.

:mischievous_grin:


... Of course, this further impresses on me how useful it would be to have python access (bindings?) to dag internals, so that you could do the work that your script does without having to hand parse the actual text of the dag file.


Carl





On Wed, 5 Jun 2019, Mark Coatsworth wrote:

Hi all, I just posted my script which optimizes .dag files by replacing dense many-PARENT-many-CHILD connections with join nodes. In the case of very large, dense dags like what Chad is using,
this results in multiple orders of magnitude improvement in memory footprint, execution speed and job submission rate.
It's attached to the ticket in gittrac:Â
https://htcondor-wiki.cs.wisc.edu/index.cgi/tktview?tn=7054

Usage is very straightforward:

./add-dagman-join-nodes.py <input-dag-file> <output-dag-file>

Please ask Chad to try using this at earliest convenience. We'd like to understand how much of an impact it makes in real production workflows.

Mark

--
Mark Coatsworth
Systems Programmer
Center for High Throughput Computing
Department of Computer Sciences
University of Wisconsin-Madison

[← Prev in Thread] Current Thread [Next in Thread→]