I have a follow up to this issue. After additional troubleshooting, I’ve discovered that the unclaimed resources move from one machine to another, so I can rule out any Class Ad incompatibility. One thing I
have noticed, is that the maximum number of Claimed resources seems to be about 85-88. Modifying the MaxJobsRunning variable doesn’t help. Setting it to 50 limits the number of running jobs to 50, but no matter how high I set it, the number of running jobs
is still about 85. Currently I am the only user on the pool, and I am also the administrator. If anyone has any log file, command, or utility I can use to try to identify the problem, I would much appreciate it. Thanks, Eric From: condor-users-bounces@xxxxxxxxxxx [mailto:condor-users-bounces@xxxxxxxxxxx]
On Behalf Of Eric Abel Fellow condor users, I have a medium sized (~150 cpus) windows cluster running condor version 7.6.1. Recently, I have noticed that I cannot utilize all of the resources. A number of the cpu’s remain in a persistent “unclaimed” state. The most relevant log
entry I can find relating to this is in StartLog: 01/09/12 13:51:02 slot1: Changing state: Owner -> Unclaimed 01/09/12 13:51:02 slot2: State change: received RELEASE_CLAIM command 01/09/12 13:51:02 slot2: Changing state and activity: Claimed/Idle -> Preempting/Vacating 01/09/12 13:51:02 slot2: State change: No preempting claim, returning to owner 01/09/12 13:51:02 slot2: Changing state and activity: Preempting/Vacating -> Owner/Idle 01/09/12 13:51:02 slot2: State change: IS_OWNER is false 01/09/12 13:51:02 slot2: Changing state: Owner -> Unclaimed This sequence repeats indefinitely for the resource in question. My guess is that the RELEASE_CLAIM is the culprit, but is the origin of the RELEASE_CLAIM? What’s truly mysterious, is that it will affect only a few cpus in a multiple
core system, the rest of which are behaving normally. I spent a few hours combing the log files and past forums, but have not been able to find a suitable solution to this problem. Has anyone encountered this before? Any solutions? Thanks Eric |