Hi John,
Coming at this problem from the HTCondor view, there does not seem to be a to send a tool to alongside a condor job to monitor resource usage. You could in theory make each job in the DAG a wrapper job that runs both the monitoring tool and the original payload
job. However, that is a lot of work and involves changing the DAG as opposed to just using an existing DAG.
On a slightly different note, HTCondor does keep track/record a good amount of information within the various class ads specifically the Job Ad and Machine Ad. Some of the Machine Ad attributes are recorded into the Job Ad based on the configuration knob SYSTEM_JOB_MACHINE_ATTRS.
With that in mind, you could add a service node as a local universe job to a DAG that runs a job querying data about jobs from the job queue periodically and get the final values from condor_history then record the data in a file.
Hope this helps some,
Cole Bollig
From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of John N Calley via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Friday, January 13, 2023 3:49 PM To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx> Cc: John N Calley <calley_john_n@xxxxxxxxx> Subject: [HTCondor-users] Some way to automatically add a resource monitoring tool (like collect) to every job in a DAG? Hi, I’d like to be able to take an existing DAG and somehow arrange to run collectl (https://collectl.sourceforge.net), or some other similar resource monitoring tool along with every job. All my jobs are scheduled through SGE (not by condor directly) so I need this to be independent of condor facilities. I was wondering if anyone else has done anything like this or might have thoughts on how best to approach it?
Thank You,
John
John Calley, Ph.D. Exec. Director - Biology Genomics and Bioinformatics, Statistics – Discovery & Development Eli Lilly and Company Lilly Corporate Center, Indianapolis, IN 46285 USA 317.433-3399 (office)
|