[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Question: Missing Startd Statistics for Slot



Good morning TJ,

 

Sorry for the delay in acknoledging this as I was busy last week. Thanks for providing clarification regarding JobDuration and JobBusyTime being global counters and for the explanation about what happens when configuring STARTD_SLOT_ATTRS. I had a follow-up question:

 

The 16 attributes generated in the startd due to the statistics probes, JobDuration and JobBusyTime, are they defined/documented somewhere in the HTCondor documentation as to what they exactly mean and how they are computed?

 

I do see that these attributes are named in a certain way which is great, but I was not sure if my understanding of what these attributes are doing matches what they actually mean. I wanted to know more about the definition of these attributes as I’m trying to use a couple of these attributes (`RecentJobBusyTimeAvg` and `RecentJobBusyTimeCount`) in a feature implementation and I’m trying to test a scenario where these attributes have certain values that would trigger the functionality I’m designing.

 

As always, looking forward to your reply.

 

 

Thanks,

Namratha

 

From: John M Knoeller <johnkn@xxxxxxxxxxx>
Date: Monday, February 26, 2024 at 18:29
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Cc: Namratha V. Urs <nurs@xxxxxxxx>
Subject: Re: Question: Missing Startd Statistics for Slot

[EXTERNAL] – This message is from an external sender

The JobDuration and JobBusyTime counters should should still work.   

 

They are global counters, however, not per slot counters. 

 

So there is no reason to configure STARTD_SLOT_ATTRS in this way.  The counters are for the whole startd, there a NO per-slot counters, so using STARTD_SLOT_ATTRS in this way will just create a whole lot of copies of the same  values.

 

-tj

 


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Namratha V. Urs via HTCondor-users <htcondor-users@xxxxxxxxxxx>
Sent: Monday, February 26, 2024 1:45 PM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Cc: Namratha V. Urs <nurs@xxxxxxxx>
Subject: [HTCondor-users] Question: Missing Startd Statistics for Slot

 

Hi there,

 

I am a developer on the GlideinWMS project and we are currently looking into implementing a blackhole detection mechanism for glideins. There had been some conversation/discussion about this back in 2018 and I have been referring to those notes that were made available internally within my team since I’ve been working on enabling this feature in GlideinWMS. All the details I describe next are based off of that. 

 

We have the following lines in our condor configuration:

STARTD.STATISTICS_TO_PUBLISH_LIST = $(STATISTICS_TO_PUBLISH_LIST) JobDuration, JobBusyTime

STARTD_SLOT_ATTRS = RecentJobBusyTimeAvg, RecentJobBusyTimeCount

 

The notes seemed to convey that there are 16 attributes generated in each slot because of two statistics probes (JobDuration, JobBusyTime). While these attributes are not published by default (due to their number), their publishing can be enabled by adding the first line in the code snippet to the configuration of the execute nodes. Having said that, as per my understanding, using the STARTD_SLOT_ATTRS should enable two attributes per slot -- slot<N>_RecentJobBusyTimeAvg and slot<N>_RecentJobBusyTimeCount depending on the type of slot (fixed vs. partitionable). However, I do not see these two attributes in the classad when I query the classad using the command: `condor_status -l <slot1@glidein> | grep -i “job”` on the client side. 

 

I wanted to reach out to understand if I’m missing something and/or learn if things have changed in HTCondor since 2018 (which is when the initial discussion about the blackhole mechanism took place between GlideinWMS and HTCondor teams). If you need further information about anything that I’ve described above, please let me know and I’ll be happy to share.

 

Looking forward to your reply.

 

 

 

Thanks,

 

Namratha Urs (she/her)

Software Developer, Scientific Compute Services and Tools

Computational Science and AI Directorate, Fermi National Accelerator Laboratory

Ph.D. Candidate, Computer Science | University of North Texas