Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Detailled monitoring of a DAG

Date: Tue, 31 Aug 2021 08:59:04 -0500
From: Greg Thain <gthain@xxxxxxxxxxx>
Subject: Re: [HTCondor-users] Detailled monitoring of a DAG

On 8/31/21 5:33 AM, Nicolas Arnaud wrote:

Dear all,
What (Python) framework/approach would you recommend to monitor in adetailled way the running of each DAG instance? Which DAG/blocks/jobscompleted successfully or failed, how long each DAG/block/job took,why a particular job took that long (evictions, etc.), etc. I wouldthen use the individual DAG summary data to build long-termstatistics, identify problems in my code or the software environment...



Hi Nicolas:

I don't think there is an existing, comprehensive solution for thistoday.Â The htcondor python bindings have tools to read the job logs(not the DAG logs, but the job logs), and the job logs are annotatedwith the DAG node name, so that might be helpful. Some groups add DAGnode prescript or postscript to explicitly log additional informationabout job starts and restarts.



-greg

References:
- [HTCondor-users] Detailled monitoring of a DAG
  - From: Nicolas Arnaud

Prev by Date: [HTCondor-users] Detailled monitoring of a DAG
Next by Date: Re: [HTCondor-users] Negotiator only allocating 1 job per machine per cycle
Previous by thread: [HTCondor-users] Detailled monitoring of a DAG
Next by thread: [HTCondor-users] HTCondor-CE: Setting Default limits
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] Detailled monitoring of a DAG