Hi Cole and Matt,for now the idea would be to provide the users with integrated usage statistics for their jobs. At the moment, our users get a very rough summary of their last week's resource usage in CO2 equivalents. It would be nice, to have a better handle to integrate of all job instances.
We have per worker efficiency and benchmark ads, that are added to the job ads [1] and would be neat to be used for weighting the instances' run times per worker. E.g., a job with three failed instances, where the last Runtime is in the "final" job ads as
Runtime = 789 and maybe a list ad like RuntimeArrayAd = [123, 456, 789]where each instance's Runtime is appended (or the Job*Date ads)? So that one could calculate for all a job's instances a "complete" weighted summary.
Probably somewhat difficult might be, that the machine ads added to the jobs are actually incremented as separate ads - so maybe similarly
Runtime0 Runtime1 ... might be better(??) than a list? ---Alternatively, one could use the EventLogs. In principle, we could probably parse the json events through Spark or so and calculate the job instance details from the events - but that might need a bit of additional overhead.
Cheers, Thomas [1] MachineAttrCpus0 = 1 MachineAttrHS060 = 2035 MachineAttrHS06PerSlot0 = 41.53061224489796 MachineAttrHS06perWatt0 = 3.7 MachineAttrMachine0 = "batch1369.desy.de" On 19/01/2023 17.26, Cole Bollig via HTCondor-users wrote:
Hi Thomas,At the moment there isn't that elegant of a solution as you either need to set up a job post run analysisÂor have some other program/script that understands job states to query information about jobs. As Matthew stated, the python bindings may be useful for this sort of monitoring/job management.However, I have been working on a feature for this exact query for a bit now where the shadow writes the job ad to file like normal job history. It should hopefully be officially announced within a couple of feature series releases.Cheers, Cole Bollig ------------------------------------------------------------------------*From:* HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Thomas Hartmann <thomas.hartmann@xxxxxxx>*Sent:* Thursday, January 19, 2023 10:03 AM *To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>*Subject:* [HTCondor-users] aggregating job statistics over all job instancesHi all, I would like to collect job metrics for all runs of a job. So far my approach would be a postCMD - but that seems not very elegant/condor-like. I.e., a postCMD script following the payload job, that chirps a class ad array, attaches a new element and chirps the updated job ad again to the collector. However, one would need to be careful to not drop user postCMDs. Maybe there is a better way? Cheers,  Thomas _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature