[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Fix LoadAvg values, so that they better reflect the number of CPUs of a slot



the rationale is explained in the ticket. 

I'm pretty sure that the load average values of a newly created D-slot are copied directly from the P-slot at creation time,
and thus and the periodic distribution of load average will have no effect on START,  it will only effect PREEMPT. 

You won't be able to see this happening by looking at ads in the collector,  but you should be able to see it when looking in the StartLog.   If you see d-slots being created, but then not matching the job they were created for in the StartLog,  that would indicate that this logic has a problem. 

-tj


From: HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx> on behalf of Angel de Vicente <angel.vicente.garrido@xxxxxxxxx>
Sent: Monday, February 3, 2025 5:56 AM
To: htcondor-users@xxxxxxxxxxx <htcondor-users@xxxxxxxxxxx>
Subject: [HTCondor-users] Fix LoadAvg values, so that they better reflect the number of CPUs of a slot
 
Hello,

some months ago (March'24) I submitted a PR to modify the way LoadAvg
was calculated for dynamic slots. This was accepted, but later (Oct'24)
the relevant code was changed. When doing today an upgrade of HTCondor
to version 24.0.3 I realized that the new LoadAvg calculation doesn't
work "properly" (at least not four our use case).

I just wrote a comment about it in the original PR
(https://urldefense.com/v3/__https://github.com/htcondor/htcondor/pull/2317__;!!Mak6IKo!IlhxMH5KcI4cVs77QZG_vearn-LqBT8mOoWZnF6pfXNiMYvkqteyM-qMsEMyKQR8znADiXUc1KWLDybFnYpzVmLDJt1X$ ), but I'm not sure
comments in an already merged pull request will get any attention, so I
thought of sending it here as well. The comment is
https://urldefense.com/v3/__https://github.com/htcondor/htcondor/pull/2317*issuecomment-2630656894__;Iw!!Mak6IKo!IlhxMH5KcI4cVs77QZG_vearn-LqBT8mOoWZnF6pfXNiMYvkqteyM-qMsEMyKQR8znADiXUc1KWLDybFnYpzVnmXLgTX$

Hopefully John Knoeller reads this and he can explain the rationale
behind his Oct'24 commit?

Cheers,
--
Ángel de Vicente 
 Research Software Engineer (Supercomputing and BigData)
 Instituto de Astrofísica de Canarias (https://urldefense.com/v3/__https://www.iac.es/en__;!!Mak6IKo!IlhxMH5KcI4cVs77QZG_vearn-LqBT8mOoWZnF6pfXNiMYvkqteyM-qMsEMyKQR8znADiXUc1KWLDybFnYpzVt87cNx4$ )


_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe

The archives can be found at: https://www-auth.cs.wisc.edu/lists/htcondor-users/