[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] MODIFY_REQUEST_EXPR "error" and persistent dynamic slot



Hi tj,

On 10/11/22 17:06, John M Knoeller via HTCondor-users wrote:
If you add  D_MATCH:2 to STARTD_DEBUG on the execute node then 8.8 will print the full job and slot classads when it hits the case where the "Job no longer matches partitionable slot after...".  You can then save those ads to files and try sending those ads through  condor_q -better-analyze  using the -jobads and -slotads arguments to pass the job and slot files.

STARTD_DEBUG = $( STARTD_DEBUG) D_CAT D_MATCH:2

If you can upgrade the execute node to 9.0.x or 9.x, then it will do that sort of matchmaking analysis inside the schedd and print the analysis. (this feature was added in 8.9.7).

almost like a Heisenbug, the problem seemingly disappeared after setting this and running condor_reconfig and probably for good when other nodes came online in this pool and the machine_count=5 job finally started.

darn, so, I'll add this to my "cheat book" and see if the problem happens again any time soon.

Thanks a lot!

Carsten

--
Dr. Carsten Aulbert, Max Planck Institute for Gravitational Physics,
CallinstraÃe 38, 30167 Hannover, Germany, Phone +49 511 762 17185