on a second thought - Todd's suggestion might be better suited as it should also work with other universes as the Docker one On 19/06/2019 10.20, Thomas Hartmann wrote: > Hi Michael, Todd and Joan, > > many thanks for the detailed input! > > Michael's Checkfile hook looks like to deliver all I have in mind - but > then I have as Todd says full control over the nodes anyway. I will give > both approaches a try and see, what fit's most. > > @Todd > Thing is, that I would like to compile a rough power consumption summary > for each job, i.e., to read a node's power metrics and derive a very > rough estimate (scaled by #cores) of a jobs power consumption. > Motivation would be to give the users a 'real-life' clue on their > resource usage, i.e., "your job/task used ##Wh of energy - approximately > causing #g of CO2" > > Cheers and many thanks > Thomas > > > On 18/06/2019 00.29, Todd Tannenbaum wrote: >> On 6/17/2019 10:01 AM, Thomas Hartmann wrote: >>> Hi all, >>> >>> I would like to ask, if there is some 'established best practice' to run >>> periodically a script along each job. >>> >>> I.e., I would like to run a small metrics script periodically (~5m) for >>> each job, collect the output and add a summary of the metrics to the >>> job's summary. >>> >>> I guess, it should work to start such a script as pre job process into >>> the background, loop/write the metrics in a separate file/pipe and >>> colelct the metrics by a post job script. >>> But I wonder, if there is a more Condor way(?), e.g., a cron for each >>> starter (startd?) and storing the metrics in an extra job class ad (or >>> adding it to the job log with a grep'able identifier)? >>> >>> Cheers, >>> Thomas >>> >> >> Hi Thomas! >> >> A quick thought : If you have control of the execute nodes involved, >> you could set the config knobs >> >> USE_PID_NAMESPACES = True >> USER_JOB_WRAPPER = /some/path/monitor_my_jobs.sh >> >> and monitor_my_jobs.sh could be: >> >> #!/bin/bash >> # Run my monitor script >> collect_metrics.sh & >> # Exec my actual job, keeping the same pid >> exec ""$@" >> >> and collect_metrics.sh then monitor whatever you want. The only >> processes it would "see" would be the pids associated with the job >> (which is what USE_PID_NAMESPACES=True does). Every five minutes it >> could publish metrics via >> condor_chirp set_job_attr_delayed <JobAttributeName> <AttributeValue> >> which will cause the metrics to get published into the job classad so >> they are visible in the history classad. See "man condor_chirp". >> Warning... the above was just the first idea I had, I didn't test it... >> >> But a question I have for you... what metrics would your script collect? >> HTCondor is already collecting info about memory, cpu, local disk >> usage, and a few others... what other metrics are you interested in? >> >> Thanks >> Todd >> > > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/ >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature