On 6/17/2019 10:01 AM, Thomas Hartmann wrote:
Hi all,
I would like to ask, if there is some 'established best practice' to run
periodically a script along each job.
I.e., I would like to run a small metrics script periodically (~5m) for
each job, collect the output and add a summary of the metrics to the
job's summary.
I guess, it should work to start such a script as pre job process into
the background, loop/write the metrics in a separate file/pipe and
colelct the metrics by a post job script.
But I wonder, if there is a more Condor way(?), e.g., a cron for each
starter (startd?) and storing the metrics in an extra job class ad (or
adding it to the job log with a grep'able identifier)?
Cheers,
Thomas
Hi Thomas!
A quick thought : If you have control of the execute nodes involved,
you could set the config knobs
USE_PID_NAMESPACES = True
USER_JOB_WRAPPER = /some/path/monitor_my_jobs.sh
and monitor_my_jobs.sh could be:
#!/bin/bash
# Run my monitor script
collect_metrics.sh &
# Exec my actual job, keeping the same pid
exec ""$@"
and collect_metrics.sh then monitor whatever you want. The only
processes it would "see" would be the pids associated with the job
(which is what USE_PID_NAMESPACES=True does). Every five minutes it
could publish metrics via
condor_chirp set_job_attr_delayed <JobAttributeName> <AttributeValue>
which will cause the metrics to get published into the job classad so
they are visible in the history classad. See "man condor_chirp".
Warning... the above was just the first idea I had, I didn't test it...
But a question I have for you... what metrics would your script collect?
HTCondor is already collecting info about memory, cpu, local disk
usage, and a few others... what other metrics are you interested in?
Thanks
Todd
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/