Mailing List Archives
Authenticated access
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[condor-users] watch_dag utility
I recently learned how to generate and submit Condor DAGs, and have been
executing DAGs with ~1000 nodes. The actual structure of my DAGs is
not too complicated, basically ~200 parallel paths of 5 steps with minimal
cross-linking. To allow me to view the progress of DAG execution in a
concise way, I wrote a Tcl script called 'watch_dag' which scans the
*.dag and *.dag.dagman.out files, identifies "stages" in the DAG, and
prints out a summary of job status. Here is an example of running this
in the middle of executing a DAG with 1190 nodes:
pshawhan> watch_dag H1H2part1.pss2.dag
Stage Executable Total Waiting Queued Running Succeeded Failed
1 lalapps_tmpltbank 238 0 0 0 228 10
2 lalapps_inspiral 238 10 0 7 221 0
3 lalapps_inca 238 17 0 0 221 0
4 lalapps_inspiral 238 17 2 77 142 0
5 lalapps_inca 238 142 36 0 60 0
I have put a copy of this utility (~300 lines of Tcl code) at ; feel free to download
and use it. (It requires that tclsh be somewhere in your PATH, and then
of course you have to remember to put the watch_dag script into some
directory in your PATH and do 'chmod +x watch_dag'.) Type 'watch_dag'
without any arguments for a usage summary.
No warranty is implied; this is just something I threw together based on
reverse-engineering the contents of some *.dag and *.dagman.out files,
but it seems to work pretty well (for my jobs, at least). I'd appreciate
hearing any bug reports or suggestions for improvement.
Peter Shawhan
Condor Support Information:
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>