An interesting observation, if you wait and start jobs not immediately, but with an interval of 15-20 minutes, then the error does not occur. Or is it just my luck? Still need help, please, any ideas?
From: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
To: "HTCondor-Users Mail List" <htcondor-users@xxxxxxxxxxx>
Cc: "Dmitry Golubkov" <dmitry.golubkov@xxxxxxxxxxxxxx>
Sent: Thursday, June 17, 2021 1:32:43 PM
Subject: [HTCondor-users] HTCondor can't execute the job with error: Error: can't find resource with ClaimId
Dear all,
I have the htcondor cluster configured to use partitionable slots. After job submit, htcondor creates dynamic slots and trying to execute the job, but fails immediately with the error "Error: can't find resource with ClaimId (...) for 444 (ACTIVATE_CLAIM)" (please take a look at the log in attachment). After some time it re-creates dynamic slots and passes the execution succecefully, but why? Is it a known issue? This situation happens very often and slows down the execution of the jobs. Any ideas, how this can be solved? Or it is my own specific issue? The latest version of htcondor behaves the same way.
Thanks in advance,
Dmitry.
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/