Mailing List Archives Authenticated access	UW Madison Computer Sciences Department Computer Systems Lab

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] aggregating job statistics over all job instances

Date: Fri, 20 Jan 2023 15:46:39 +0100
From: Thomas Hartmann <thomas.hartmann@xxxxxxx>
Subject: Re: [HTCondor-users] aggregating job statistics over all job instances

Hi Cole and Matt,

for now the idea would be to provide the users with integrated usagestatistics for their jobs.At the moment, our users get a very rough summary of their last week'sresource usage in CO2 equivalents. It would be nice, to have a betterhandle to integrate of all job instances.

We have per worker efficiency and benchmark ads, that are added to thejob ads [1] and would be neat to be used for weighting the instances'run times per worker.E.g., a job with three failed instances, where the last Runtime is inthe "final" job ads as

  Runtime = 789
and maybe a list ad like
  RuntimeArrayAd = [123, 456, 789]

where each instance's Runtime is appended (or the Job*Date ads)? So thatone could calculate for all a job's instances a "complete" weighted summary.

Probably somewhat difficult might be, that the machine ads added to thejobs are actually incremented as separate ads - so maybe similarly

  Runtime0
  Runtime1
  ...
might be better(??) than a list?

---

Alternatively, one could use the EventLogs. In principle, we couldprobably parse the json events through Spark or so and calculate the jobinstance details from the events - but that might need a bit ofadditional overhead.


Cheers,
  Thomas

[1]
MachineAttrCpus0 = 1
MachineAttrHS060 = 2035
MachineAttrHS06PerSlot0 = 41.53061224489796
MachineAttrHS06perWatt0 = 3.7
MachineAttrMachine0 = "batch1369.desy.de"


On 19/01/2023 17.26, Cole Bollig via HTCondor-users wrote:

Hi Thomas,
At the moment there isn't that elegant of a solution as you either needto set up a job post run analysisÂor have some other program/script thatunderstands job states to query information about jobs. As Matthewstated, the python bindings may be useful for this sort ofmonitoring/job management.
However, I have been working on a feature for this exact query for a bitnow where the shadow writes the job ad to file like normal job history.It should hopefully be officially announced within a couple of featureseries releases.
Cheers,
Cole Bollig
------------------------------------------------------------------------
*From:* HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf ofThomas Hartmann <thomas.hartmann@xxxxxxx>
*Sent:* Thursday, January 19, 2023 10:03 AM
*To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
*Subject:* [HTCondor-users] aggregating job statistics over all jobinstances
Hi all,

I would like to collect job metrics for all runs of a job.Â So far my
approach would be a postCMD - but that seems not very elegant/condor-like.
I.e., a postCMD script following the payload job, that chirps a class ad
array, attaches a new element and chirps the updated job ad again to the
collector. However, one would need to be careful to not drop user postCMDs.

Maybe there is a better way?

Cheers,
 Â Thomas


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Attachment: smime.p7s
Description: S/MIME Cryptographic Signature

References:
- [HTCondor-users] aggregating job statistics over all job instances
  - From: Thomas Hartmann
- Re: [HTCondor-users] aggregating job statistics over all job instances
  - From: Cole Bollig

Prev by Date: [HTCondor-users] Fwd: refreshment of access token fetched in job sandbox
Next by Date: [HTCondor-users] Removed all jobs from a cluster via Python binding
Previous by thread: Re: [HTCondor-users] aggregating job statistics over all job instances
Next by thread: [HTCondor-users] Removed all jobs from a cluster via Python binding
Index(es):
- Date
- Thread

Mailing List Archives

Authenticated access

Re: [HTCondor-users] aggregating job statistics over all job instances