Hi Michael, Todd and Joan, many thanks for the detailed input! Michael's Checkfile hook looks like to deliver all I have in mind - but then I have as Todd says full control over the nodes anyway. I will give both approaches a try and see, what fit's most. @Todd Thing is, that I would like to compile a rough power consumption summary for each job, i.e., to read a node's power metrics and derive a very rough estimate (scaled by #cores) of a jobs power consumption. Motivation would be to give the users a 'real-life' clue on their resource usage, i.e., "your job/task used ##Wh of energy - approximately causing #g of CO2" Cheers and many thanks Thomas On 18/06/2019 00.29, Todd Tannenbaum wrote: > On 6/17/2019 10:01 AM, Thomas Hartmann wrote: >> Hi all, >> >> I would like to ask, if there is some 'established best practice' to run >> periodically a script along each job. >> >> I.e., I would like to run a small metrics script periodically (~5m) for >> each job, collect the output and add a summary of the metrics to the >> job's summary. >> >> I guess, it should work to start such a script as pre job process into >> the background, loop/write the metrics in a separate file/pipe and >> colelct the metrics by a post job script. >> But I wonder, if there is a more Condor way(?), e.g., a cron for each >> starter (startd?) and storing the metrics in an extra job class ad (or >> adding it to the job log with a grep'able identifier)? >> >> Cheers, >> Thomas >> > > Hi Thomas! > > A quick thought : If you have control of the execute nodes involved, > you could set the config knobs > > USE_PID_NAMESPACES = True > USER_JOB_WRAPPER = /some/path/monitor_my_jobs.sh > > and monitor_my_jobs.sh could be: > > #!/bin/bash > # Run my monitor script > collect_metrics.sh & > # Exec my actual job, keeping the same pid > exec ""$@" > > and collect_metrics.sh then monitor whatever you want. The only > processes it would "see" would be the pids associated with the job > (which is what USE_PID_NAMESPACES=True does). Every five minutes it > could publish metrics via > condor_chirp set_job_attr_delayed <JobAttributeName> <AttributeValue> > which will cause the metrics to get published into the job classad so > they are visible in the history classad. See "man condor_chirp". > Warning... the above was just the first idea I had, I didn't test it... > > But a question I have for you... what metrics would your script collect? > HTCondor is already collecting info about memory, cpu, local disk > usage, and a few others... what other metrics are you interested in? > > Thanks > Todd >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature