Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] slots stay claimed/idle even after UNUSED_CLAIM_TIMEOUT expired
- Date: Mon, 09 Aug 2021 11:08:59 +0300 (MSK)
- From: "Sergey A. Komissarov" <sergey.komissarov@xxxxxxxxxxxxxx>
- Subject: Re: [HTCondor-users] slots stay claimed/idle even after UNUSED_CLAIM_TIMEOUT expired
Hello Greg,
> Are all these machines being used by parallel universe jobs?
Yes, all machines run only parllel universe jobs.
----------
Sergey Komissarov
Senior Software Developer
DATADVANCE
This message may contain confidential information
constituting a trade secret of DATADVANCE. Any distribution,
use or copying of the information contained in this
message is ineligible except under the internal
regulations of DATADVANCE and may entail liability in
accordance with the current legislation of the Russian
Federation. If you have received this message by mistake
please immediately inform me of it. Thank you!
----- Original Message -----
From: "Greg Thain" <gthain@xxxxxxxxxxx>
To: "htcondor-users" <htcondor-users@xxxxxxxxxxx>
Sent: Friday, August 6, 2021 6:22:42 PM
Subject: Re: [HTCondor-users] slots stay claimed/idle even after UNUSED_CLAIM_TIMEOUT expired
On 8/6/21 10:18 AM, Stanislav V. Markevich via HTCondor-users wrote:
> Hi,
>
> I set UNUSED_CLAIM_TIMEOUT to 180 but some (dynamic) slots are staying in Clamed/Idle state forever (see the last column):
UNUSED_CLAIM_TIMEOUT is only used when running parallel universe jobs.Â
Are all these machines being used by parallel universe jobs? In the
worst case, I believe that "condor_vacate" should be able put
Claimed/Idle back to Unclaimed.
-greg
>
>
> condor_status -af:h Name OpSys State Activity Cpus Memory TotalTimeClaimedIdle
>
> Name OpSys State Activity Cpus Memory TotalTimeClaimedIdle
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Unclaimed Idle 191 107 undefined
> slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 512 5
> slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 512 5
> slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 5
> slot1_4@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 73153
> slot1_5@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 73153
> slot1_6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 73153
> slot1_8@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 84669
> slot1_9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 84669
> slot1_10@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 84669
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Unclaimed Idle 191 107 undefined
> slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 512 18
> slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 18
> slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 65209
> slot1_4@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 65209
> slot1_5@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 83370
> slot1_6@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 512 65209
> slot1_7@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 65209
> slot1_8@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 65209
> slot1_9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 16593
> slot1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Unclaimed Idle 191 107 undefined
> slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 512 23
> slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 512 73171
> slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 23
> slot1_4@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 84608
> slot1_5@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 84608
> slot1_9@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 84608
> slot1_10@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 84608
> slot1_11@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 84608
> slot1_12@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx LINUX Claimed Idle 1 128 84608
>
>
> Normally when slot exceeds UNUSED_CLAIM_TIMEOUT there is a record in the log saying that this slot is released:
>
> 2021-08-06T14:45:50.956165411Z condor_schedd[3032]: Resource slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx has been unused for 182 seconds, limit is 180, releasing
>
> But for problematic slots the last records in the log was hours ago (~24h):
>
> 2021-08-05T16:34:12.960897270Z condor_startd[859]: slot1_12: State change: starter exited
> 2021-08-05T16:34:12.960904225Z condor_startd[859]: slot1_12: Changing activity: Busy -> Idle
> 2021-08-05T16:34:12.960968125Z condor_startd[859]: slot1_12: State change: idle claim shutting down due to CLAIM_WORKLIFE
> 2021-08-05T16:34:12.960974666Z condor_startd[859]: slot1_12: Changing state and activity: Claimed/Idle -> Preempting/Vacating
> 2021-08-05T16:34:12.962018643Z condor_startd[859]: slot1_12: State change: No preempting claim, returning to owner
> 2021-08-05T16:34:12.962359058Z condor_startd[859]: slot1_12: Changing state and activity: Preempting/Vacating -> Owner/Idle
> 2021-08-05T16:34:12.962697322Z condor_startd[859]: slot1_12: State change: IS_OWNER is false
> 2021-08-05T16:34:12.962706591Z condor_startd[859]: slot1_12: Changing state: Owner -> Unclaimed
> 2021-08-05T16:34:12.962748296Z condor_startd[859]: slot1_12: Changing state: Unclaimed -> Delete
> 2021-08-05T16:34:12.962880429Z condor_startd[859]: slot1_12: Resource no longer needed, deleting
>
> and then nothing. The slots are still there and claimed.
>
> Is this a bug? Is there a way to release these slots forcefully?
>
>
> Best regards,
> Stanislav V. Markevich
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/