Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] MODIFY_REQUEST_EXPR "error" and persistent dynamic slot
- Date: Wed, 12 Oct 2022 15:18:13 +0200
- From: Carsten Aulbert <carsten.aulbert@xxxxxxxxxx>
- Subject: Re: [HTCondor-users] MODIFY_REQUEST_EXPR "error" and persistent dynamic slot
Hi tj,
On 10/11/22 17:06, John M Knoeller via HTCondor-users wrote:
If you add D_MATCH:2 to STARTD_DEBUG on the execute node then 8.8 will print the full job and slot classads when it hits the case where the "Job no longer matches partitionable slot after...". You can then save those ads to files and try sending those ads through condor_q -better-analyze using the -jobads and -slotads arguments to pass the job and slot files.
STARTD_DEBUG = $( STARTD_DEBUG) D_CAT D_MATCH:2
If you can upgrade the execute node to 9.0.x or 9.x, then it will do that sort of matchmaking analysis inside the schedd and print the analysis. (this feature was added in 8.9.7).
almost like a Heisenbug, the problem seemingly disappeared after setting
this and running condor_reconfig and probably for good when other nodes
came online in this pool and the machine_count=5 job finally started.
darn, so, I'll add this to my "cheat book" and see if the problem
happens again any time soon.
Thanks a lot!
Carsten
--
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany, Phone +49 511 762 17185