Hi Andrew, yes, any actual matching of cgroups and slots/jobs ClassAds is something I have put off so far ;) Probably aggregating ClassAd information and cgroup information on the node and send a combined measurement is really the most convenient way in the end (have to see, how well everything scales...) Cheers, Thomas On 2016-03-15 15:50, andrew.lahiff@xxxxxxxxxx wrote: > Hi Thomas, > > Note that you'll probably find the labels you get in cAdvisor or an alternative won't match directly to HTCondor job ids, e.g. they'll look like: > > condor_pool_condor_slot1_3@xxxxxxxxxxxxxxxxxxxxxxx > > (i.e. how the cgroups are named), which makes the Grafana plots a little hard to use. One way of getting around this could be to have a cron on each worker node which queries both cAdvisor (using its rest api) and HTCondor, and takes all the stats from cAdvisor but labels them in a more appropriate way, e.g. GlobalJobId, owner, etc. It can then send the appropriately tagged data to InfluxDB, rather than getting cAdvisor to do it directly. I've done this for a different cluster manager and it seems to work well, but haven't tried it yet with HTCondor. However, there may be better ways of getting the same result that I haven't thought of :-) > > Regards, > Andrew. > > ________________________________________ > From: HTCondor-users [htcondor-users-bounces@xxxxxxxxxxx] on behalf of Thomas Hartmann [thomas.hartmann@xxxxxxx] > Sent: Tuesday, March 15, 2016 2:06 PM > To: HTCondor-Users Mail List > Subject: Re: [HTCondor-users] How to use condor_chirp? > > Hi Brian, > > thanks for the warning. > > Our idea is to update a job's ClassAd with information on the cgroups > context its slot is using. > In the end, we would like to to be able to refer a job to the cgroups > statistics, i.e., to monitor the local cgroups and send statistics into > an InfluxDB and in parallel accumulate basic job information from Condor > (frequency, scaling, only accumulation? etc. remains to be seen...). > > In the end, it would be nice to be able to plot basic jobs statistics > from InfluxDB with Grafana for at least a subset of jobs (individual > users - large scale/institutional users may be not necessary) for a > range of a few days (regular pruning of data points/measurements in > InfluxDB - scales how well?). > > One question for me is, if such condor job statistics are better > accumulated/send from the schedd or better (somehow?) from the startds? > For cgroup statistics I would try to send them directly from the nodes > (cAdvisor or similar approach). > > Cheers, > Thomas > > On 2016-03-15 12:23, Brian Bockelman wrote: >> Hi Thomas, >> >> Two things to note: >> 1) "condor_chirp set_job_attr” requires +WantIOProxy=true in the classad. This updates the ClassAd immediately in the schedd - which can be a scalability concern. >> 2) “condor_chirp set_job_attr_delayed” works by default. However, it only sends the attribute in the next scheduled ClassAd update (for CPU and memory usage); there’s less scalability concern. Additionally, attributes set with this command must start with the prefix “Chirp” (case-sensitive). >> >> Can you give some background on what you’re trying to accomplish? >> >> Brian >> >>> On Mar 14, 2016, at 12:44 PM, Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote: >>> >>> Hi all, >>> >>> I would like to inject some system information into jobs' ClassAds. >>> As I understand condor_chirp, I cannot inject/manipulate ClassAds on a >>> worker for a running job, but only during submission. (I assume, that >>> ~/.job.ad is where a job keeps its ClassAds [1] -- but can I inject it >>> from outside the job 'properly'?) >>> >>> We get our grid jobs currently via an ARC-CE so I suppose the best place >>> would be there (where?) to enable the communication token by appending >>> +WantIOProxy = TRUE >>> to job submissions scripts, or? >>> >>> I am not sure, where to actually call condor_chirp on the worker? >>> My best guess is so far, to find the template for the job wrappers, i.e., >>> /var/lib/condor/execute/dir_*/condor_exec.exe >>> and chirp from it, or? Probably starting a separate thread for updating >>> changing ClassAds. >>> Would that be reasonable or is there a better way? >>> Where would I find the wrapper for condor_exec.exe? >>> >>> Cheers and thanks, >>> Thomas >>> >>> >>> >>> [1] >>>> cat /proc/`ps axf | grep "/bin/bash -l /var/lib/condor/execute/dir" | >>> tail -n1 | cut -d " " -f 2`/environ | tr '\0' '\n' | grep "_CONDOR_JOB_AD" >>> _CONDOR_JOB_AD=/var/lib/condor/execute/dir_38259/.job.ad >>> >>> _______________________________________________ >>> HTCondor-users mailing list >>> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a >>> subject: Unsubscribe >>> You can also unsubscribe by visiting >>> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users >>> >>> The archives can be found at: >>> https://lists.cs.wisc.edu/archive/htcondor-users/ >> >> >> _______________________________________________ >> HTCondor-users mailing list >> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a >> subject: Unsubscribe >> You can also unsubscribe by visiting >> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users >> >> The archives can be found at: >> https://lists.cs.wisc.edu/archive/htcondor-users/ >> > > > _______________________________________________ > HTCondor-users mailing list > To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a > subject: Unsubscribe > You can also unsubscribe by visiting > https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users > > The archives can be found at: > https://lists.cs.wisc.edu/archive/htcondor-users/ >
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature