[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] [External] Re: Concatenating history files?



Hi Michael,

If you're considering a Python solution, we've used this script [1] for our monitoring ingestion for years. It only depends on the HTCondor python bindings for classad formatting, so very old is fine. Since we save all the history files and gzip them, it's easy to re-ingest them if we update our monitoring. Python is very flexible once you have the job ads, so you can do whatever you want at that point.

Best,
David

[1] read_history_files.py
import glob
import gzip
import classad

def classad_to_dict(c):
  ret = {}
  for k in c.keys():
    try:
      ret[k] = c.eval(k)
    except TypeError:
      ret[k] = c[k]
  return ret

def read_from_file(filename):
  with (gzip.open(filename, 'rt', encoding='utf-8') if filename.endswith('.gz') else open(filename)) as f:
    entry = ''
    for line in f.readlines():
      if line.startswith('***'):
        try:
          c = classad.parseOne(entry)
          yield classad_to_dict(c)
          entry = ''
        except:
          entry = ''
      else:
        entry += line+'\n'

for filename in glob.iglob('/var/log/condor/history*'):
  for job in read_from_file(filename):
    # process the job here


On Thu, Sep 12, 2024 at 9:50âAM Pelletier, Michael V RTX via HTCondor-users <htcondor-users@xxxxxxxxxxx> wrote:

Greg,

Â

Thanks for the container suggestion! That could work! However Iâm dealing with a classified air-gapped network so getting Docker or one of its cousins up and running would invoke a panoply of paperwork, not to mention approvals for the newer version of HTCondor.

Â

Stefano,

Â

Thanks very much for that Python suggestion! Iâll take a look and see if it might be able to do what I need. Iâve got about 60 gigabytes of history data when all is said and done so speed is a significant consideration. Probably would want to pickle the dict for multiple runs.

Â

One of the tasks for this history is to categorize the jobs based on the JobDescription and/or Cmd/Arguments, and I was thinking of using an IfThenElse() _expression_ to apply categories based on a collection of regexps, but I have a feeling that might take many, many hours to run. Iâll do some testing and see how it goes.

Â

It looks like the oneliner would only grab a single job out of the file, the last one it finds. Iâll tinker with it to see if I can build out a dict of arrays or something like that, making sure that the index within each attribute keyâs array lines up to other keys. Or maybe read each ad one at a time looking for a blank line or *** line, and stash the whole attribute dict for the job in question in an outer dict under a âClusterId.ProcIdâ key.

Â

Thanks again! Iâll let the group know how it goes.

Â

Michael Pelletier

Principal Technologist

High Performance Computing

Classified Infrastructure Services

Â

C: +1 339.293.9149
michael.v.pelletier@xxxxxxx

Â

From: Stefano Dal Pra <stefano.dalpra@xxxxxxxxxxxx>
Sent: Thursday, September 12, 2024 5:24 AM
To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
Cc: Pelletier, Michael V RTX <Michael.V.Pelletier@xxxxxxx>
Subject: [External] Re: [HTCondor-users] Concatenating history files?

Â

I had to do something similar years ago and tried two ways:
1) condor_q -jobads history.<ClusterId>.<ProcId> -af:j '<classad_functions to extract what i need>'

2) load the hist.file into a python dict and process it; this can be done with a one liner:
dict([map(str.strip, x.split('=',1)) for x in f.readlines()])

then extract what you need.

Solution 1 was more appealing to me, but turned out to be much slower (probably due to overhead in loading parsers, which is done once per file).
Solution 2 was pretty fast in my use case: extracting fields of interest and load them into a postgres table using a COPYÂ statement.

Stefano

On 11/09/24 20:16, Pelletier, Michael V RTX via HTCondor-users wrote:

Thanks very much for the tip, Cole!

Â

My trouble is that weâre still on version 8, and since weâre drawing down the cluster in question thereâs no funding to address an upgrade to version 10 or later. Sorry, I should have specified a version in my original message. Any alternatives available in v8? Iâm thinking maybe not since the -search option may not have been introduced as a new feature.

Â

A for loop with multiple invocations of condor_history -file should do the trick if thatâs the only avenue available in the outdated release.

Â

Michael Pelletier

Principal Technologist

High Performance Computing

Classified Infrastructure Services

Â

C: +1 339.293.9149
michael.v.pelletier@xxxxxxx

Â

From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> On Behalf Of Cole Bollig via HTCondor-users
Sent: Wednesday, September 11, 2024 10:24 AM
To: HTCondor-Users Mail List
<htcondor-users@xxxxxxxxxxx>
Cc: Cole Bollig
<cabollig@xxxxxxxx>
Subject: [External] Re: [HTCondor-users] Concatenating history files?

Â

Hi Michael,

Â

Since version V10.3.0, you can do condor_history -search /path/to/filename. This will find and read (in correct order) all matching timestamp rotated history files so in this example the following files would be parsed by condor_history:

  1. /path/to/filename
  1. /path/to/filename.20240911092145
  1. /path/to/filename.20240825155501

Â

Cheers,

Cole Bollig


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Pelletier, Michael V RTX via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Wednesday, September 11, 2024 9:10 AM
To: HTCondor-Users Mail List (
htcondor-users@xxxxxxxxxxx) <htcondor-users@xxxxxxxxxxx>
Cc: Pelletier, Michael V RTX <
Michael.V.Pelletier@xxxxxxx>
Subject: [HTCondor-users] Concatenating history files?

Â

Hi folks,

Â

Iâve got a huge amount of job history Iâm trying to go through and summarize/categorize, to the tune of many gigabytes, and as you might expect itâs divided into a collection of rotated files with the usual timestamps.

Â

Iâm trying to use the -file option, so that it doesnât bother the server and suffer the constraints of network connection and can work directly from a local filesystem where Iâve stashed the files.

Â

Is there a way to enable condor_history to scan all the files in one fell swoop, rather than going through them one at a time with separate condor_history -file commands? I tried concatenating the files but it looks like the last line in each file has some metadata that condor_history pays attention to.

Â

Thanks for any suggestions!

Â

Michael V Pelletier

Principal Technologist

High Performance Computing

Classified Infrastructure Services



_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
Â
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/

Â

_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/