[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] HTcondor disk resource related queries



Hello Thomas,

Regarding the negative value I guess I have found the issue. In our case the condor scratch path is symlinked to a bigger disk that's why it's showing negative value.

Left with following queries:

Questions:

- From where it's picking the default 4GB Disk size?
- Why is it setting Disk size to different values than what we ask in the modify _expression_?


Thanks & Regards,
Vikrant Aggarwal


On Tue, Jun 6, 2023 at 10:13âAM Thomas Hartmann <thomas.hartmann@xxxxxxx> wrote:
Hi Vikrant,

how does your storage set up looks like?

My guess would be that
 Â18446744073692897281
is a bit large, so that the partitionable parent slot maybe has an
overflow or so, but that the partitioned slots are cut out properly.

Cheers,
 ÂThomas

On 06/06/2023 14.53, Vikrant Aggarwal wrote:
> Hello Experts,
>
> Any input on disk issues?
>
> Thanks & Regards,
> Vikrant Aggarwal
>
>
> On Sat, Jun 3, 2023 at 6:28âPM Vikrant Aggarwal <ervikrant06@xxxxxxxxx
> <mailto:ervikrant06@xxxxxxxxx>> wrote:
>
>Â Â ÂHello Tomer,
>
>Â Â ÂThanks for sharing the configuration. it helps to put the job on
>Â Â Âhold breaching the requestdisk. We have a problem in our infra where
>Â Â Âpeople don't ask for the request disk in job spec hence I want to
>Â Â Âmodify it on a worker machine based on some logic related to CPUs. I
>Â Â Âam seeing strange behavior.
>
>Â Â ÂRequestDisk will remain intact whatever we put in the job submit
>Â Â Âfile 2GB but I Âcouldn't understand where it's picking the Disk
>Â Â Âattribute. By default it's ~ 4GB
>
>Â Â Â# condor_who -af:h globaljobid disk DiskUsage TotalDisk
>Â Â ÂTotalSlotDisk RequestDisk
>
>  Âglobaljobid                    Âdisk Â
>  ÂDiskUsage TotalDisk ÂTotalSlotDisk     RequestDisk
>Â Â Âtest.example.com#429.0#1685829846
>Â Â Â<http://test.example.com#429.0%231685829846> 4271297 Â27Â Â Â
>Â Â Â Â4271296648 4271297.0 Â Â Â Â Â Â 2097152
>
>Â Â ÂAttempt 1 : Try to modify the RequestDisk to 4GB but it becomes 8GB
>Â Â Â- May be addition of default 4GB
>
>Â Â ÂMODIFY_REQUEST_EXPR_REQUESTDISK = 4194304
>
>  Âglobaljobid                    Âdisk Â
>  ÂDiskUsage TotalDisk ÂTotalSlotDisk     RequestDisk
>Â Â Âtest.example.com#430.0#1685830072
>Â Â Â<http://test.example.com#430.0%231685830072> 8542594 Â27Â Â Â
>Â Â Â Â4271296648 8542594.0 Â Â Â Â Â Â 2097152
>
>
>Â Â ÂAttempt 2 : Try to modify the RequestDisk to 6GB but it becomes 8GB
>Â Â Â- If we go by 4GB addition logic it should have been 10GB
>
>Â Â ÂMODIFY_REQUEST_EXPR_REQUESTDISK = 6291456
>
>
>  Âglobaljobid                    Âdisk Â
>  ÂDiskUsage TotalDisk ÂTotalSlotDisk     RequestDisk
>Â Â Âtest.example.com#431.0#1685830179
>Â Â Â<http://test.example.com#431.0%231685830179> 8542594 Â2Â Â Â Â
>Â Â Â4271296648 8542594.0 Â Â Â Â Â Â 2097152
>
>Â Â ÂAttempt 3 : Try to modify the RequestDisk to 8GB as expected it
>Â Â Âbecomes 12GB.
>
>Â Â ÂMODIFY_REQUEST_EXPR_REQUESTDISK = 8388608
>
>  Âglobaljobid                    Âdisk Â
>  ÂDiskUsage TotalDisk ÂTotalSlotDisk     RequestDisk
>Â Â Âtest.example.com#428.0#1685829703
>Â Â Â<http://test.example.com#428.0%231685829703> 12813890 8192027Â
>Â Â Â4271296648 12813890.0 Â Â Â Â Â Â2097152
>
>Â Â ÂAttempt 4 : Try to modify the disk size to 1GB. it retains 4GB size.
>
>Â Â ÂMODIFY_REQUEST_EXPR_REQUESTDISK = 1048576
>
>  Âglobaljobid                    Âdisk Â
>  ÂDiskUsage TotalDisk ÂTotalSlotDisk     RequestDisk
>Â Â Âtest.example.com#432.0#1685830887
>Â Â Â<http://test.example.com#432.0%231685830887> 4271297 Â2Â Â Â Â
>Â Â Â4271296648 4271297.0 Â Â Â Â Â Â 2097152
>
>
>Â Â ÂCommand used to grab outputs:
>
>Â Â Âcondor_who -af:h globaljobid disk DiskUsage TotalDisk TotalSlotDisk
>Â Â ÂRequestDisk
>
>
>Â Â ÂFinally more confusion with negative disk values in following output:
>
>Â Â Â# condor_status `hostname` -server
>  ÂName                      OpSys    ArchÂ
>  ÂLoadAv Memory  Disk   ÂMips  ÂKFlops
>
>Â Â Âslot1@xxxxxxxxxxxxxxxxxxxxxxxxxx
>Â Â Â<mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxxxx> Â LINUX Â Â Â X86_64
>Â Â Â Â0.000 Â 172962 -57841021 Â 22492 Â 1705677
>Â Â Âslot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxx
>Â Â Â<mailto:slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX Â Â Â X86_64
>Â Â Â Â0.000 Â Â19218 Â12813890 Â 22492 Â 1705677
>Â Â Âslot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxx
>Â Â Â<mailto:slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX Â Â Â X86_64
>Â Â Â Â0.000 Â Â19218 Â 8542594 Â 22492 Â 1705677
>Â Â Âslot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxx
>Â Â Â<mailto:slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX Â Â Â X86_64
>Â Â Â Â0.000 Â Â19218 Â 8542594 Â 22492 Â 1705677
>Â Â Âslot1_4@xxxxxxxxxxxxxxxxxxxxxxxxxx
>Â Â Â<mailto:slot1_4@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX Â Â Â X86_64
>Â Â Â Â0.000 Â Â19218 Â 4271297 Â 22492 Â 1705677
>
>          ÂMachines Avail ÂMemory    ÂDisk    ÂMIPS Â
>Â Â Â ÂKFLOPS
>
>Â Â Â Â X86_64/LINUX Â Â Â Â5 Â Â 5 Â Â Â249834 18446744073685880970Â Â
>Â Â Â Â112460 Â Â 8528385
>
>       ÂTotal    Â5   5   Â249834 18446744073685880970 Â
>Â Â Â Â112460 Â Â 8528385
>
>
>
>
>Â Â ÂQuestions:
>
>Â Â Â- From where it's picking the default 4GB Disk size?
>Â Â Â- Why is it setting Disk size to different values than what we ask
>Â Â Âin the modify _expression_?
>Â Â Â- Why in -server output we see negative disk value.
>
>
>Â Â Âhtcondor version : 9.0.17
>
>
>
>Â Â ÂRegards,
>Â Â ÂVikrant Aggarwal
>
>Â Â ÂOn Thu, 1 Jun, 2023, 09:38 Tomer Pearl, <tomerp@xxxxxxxxxxx
>Â Â Â<mailto:tomerp@xxxxxxxxxxx>> wrote:
>
>Â Â Â Â ÂHi Vikrant,
>
>Â Â Â Â ÂThe following configuration works for me. Not sure which version
>Â Â Â Â ÂI'm running, should be 9+.
>
>Â Â Â Â ÂSTARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) RequestDisk
>Â Â Â Â ÂDISK_USAGE_EXCEEDED = (JobUniverse !=13 && DiskUsage =!=
>Â Â Â Â ÂUNDEFINED && DiskUsage > RequestDisk)
>Â Â Â Â Â*use POLICY: *WANT_HOLD*_IF* = (DISK_USAGE_EXCEEDED, 105, my
>Â Â Â Â Âerror string..).
>
>Â Â Â Â ÂNot sure if /my error string../ should be surroundedÂby
>Â Â Â Â Âquotation marks, as I'm templating the file with Jinja.
>
>Â Â Â Â ÂTomer.
>
>Â Â Â Â Â------------------------------------------------------------------------
>Â Â Â Â Â*From:* HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx
>Â Â Â Â Â<mailto:htcondor-users-bounces@xxxxxxxxxxx>> on behalf of
>Â Â Â Â ÂVikrant Aggarwal <ervikrant06@xxxxxxxxx
>Â Â Â Â Â<mailto:ervikrant06@xxxxxxxxx>>
>Â Â Â Â Â*Sent:* Thursday, June 1, 2023 12:44 AM
>Â Â Â Â Â*To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx
>Â Â Â Â Â<mailto:htcondor-users@xxxxxxxxxxx>>
>Â Â Â Â Â*Subject:* Re: [HTCondor-users] HTcondor disk resource related
>Â Â Â Â Âqueries
>Â Â Â Â ÂHello Experts,
>
>Â Â Â Â ÂI am testing this configuration to put the jobs on hold
>Â Â Â Â Âbreaching the disk limit.
>
>Â Â Â Â ÂSTARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) RequestDisk
>Â Â Â Â ÂDISK_USAGE_EXCEEDED = (JobUniverse =!=13 && DiskUsage =!=
>Â Â Â Â ÂUNDEFINED && DiskUsage > RequestDisk)
>Â Â Â Â ÂWANT_HOLD = $(DISK_USAGE_EXCEEDED)
>Â Â Â Â ÂWANT_HOLD_REASON = "Job exceeded disk usage limits"
>
>Â Â Â Â ÂI clearly see the jobs are using more than RequestDisk size
>Â Â Â Â Âstill they are not getting held.
>
>Â Â Â Â Â# condor_who -af:h globaljobid disk DiskUsage TotalDisk
>Â Â Â Â ÂTotalSlotDisk RequestDisk
>
>    Âglobaljobid                    Âdisk Â
>    ÂDiskUsage TotalDisk ÂTotalSlotDisk     RequestDisk
>Â Â Â Â Âtest.example.com#412.0#1685567906
>Â Â Â Â Â<http://test.example.com#412.0%231685567906> 21356484 8192026Â
>Â Â Â Â Â4271296648 21356484.0 Â Â Â Â Â Â16777216
>Â Â Â Â Âtest.example.com#413.0#1685567923
>Â Â Â Â Â<http://test.example.com#413.0%231685567923> 12813890 8192026Â
>Â Â Â Â Â4271296648 12813890.0 Â Â Â Â Â Â8388608
>Â Â Â Â Âtest.example.com#414.0#1685567952
>Â Â Â Â Â<http://test.example.com#414.0%231685567952> 8542594 Â8192026Â
>Â Â Â Â Â4271296648 8542594.0 Â Â Â Â Â Â 3250000
>Â Â Â Â Âtest.example.com#415.0#1685568493
>Â Â Â Â Â<http://test.example.com#415.0%231685568493> 8542594 Â8192025Â
>Â Â Â Â Â4271296648 8542594.0 Â Â Â Â Â Â 3250000
>Â Â Â Â Âtest.example.com#416.0#1685568803
>Â Â Â Â Â<http://test.example.com#416.0%231685568803> 12813890 8192026Â
>Â Â Â Â Â4271296648 12813890.0 Â Â Â Â Â Â10000000
>Â Â Â Â Âtest.example.com#417.0#1685568954
>Â Â Â Â Â<http://test.example.com#417.0%231685568954> 4271297 Â8192025Â
>Â Â Â Â Â4271296648 4271297.0 Â Â Â Â Â Â 1
>
>Â Â Â Â Â9.0.17 is htcondor version I am using.
>
>
>Â Â Â Â ÂThanks & Regards,
>Â Â Â Â ÂVikrant Aggarwal
>
>
>Â Â Â Â ÂOn Tue, May 30, 2023 at 1:09âPM Vikrant Aggarwal
>Â Â Â Â Â<ervikrant06@xxxxxxxxx <mailto:ervikrant06@xxxxxxxxx>> wrote:
>
>Â Â Â Â Â Â ÂHello Experts,
>
>Â Â Â Â Â Â ÂCouple of queries:
>
>Â Â Â Â Â Â Â- Why it's showing negative value for primary partitionable
>Â Â Â Â Â Â Âslot.
>
>Â Â Â Â Â Â Â# condor_status `hostname` -server
>      ÂName                      OpSys  Â
>      ÂArch  LoadAv Memory  Disk   ÂMips  ÂKFlops
>
>Â Â Â Â Â Â Âslot1@xxxxxxxxxxxxxxxxxxxxxxxxxx
>Â Â Â Â Â Â Â<mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxxxx> Â LINUXÂ Â Â
>Â Â Â Â Â Â ÂX86_64 Â0.000 Â 211398 -25210961 Â 25601 Â 1764976
>Â Â Â Â Â Â Âslot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxx
>Â Â Â Â Â Â Â<mailto:slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUXÂ Â Â
>Â Â Â Â Â Â ÂX86_64 Â0.000 Â Â19218 Â 4278313 Â 25601 Â 1764976
>Â Â Â Â Â Â Âslot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxx
>Â Â Â Â Â Â Â<mailto:slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUXÂ Â Â
>Â Â Â Â Â Â ÂX86_64 Â0.000 Â Â19218 Â 4278313 Â 25601 Â 1764976
>
>              ÂMachines Avail ÂMemory    ÂDisk  Â
>Â Â Â Â Â Â Â ÂMIPS Â Â ÂKFLOPS
>
>Â Â Â Â Â Â Â Â X86_64/LINUX Â Â Â Â3 Â Â 3 Â Â Â249834
>Â Â Â Â Â Â Â18446744073692897281 Â Â Â 76803 Â Â 5294928
>
>           ÂTotal    Â3   3   Â249834
>Â Â Â Â Â Â Â18446744073692897281 Â Â Â 76803 Â Â 5294928
>
>
>Â Â Â Â Â Â Â# condor_status -compact `hostname` -af Disk
>Â Â Â Â Â Â Â4269756335
>
>Â Â Â Â Â Â Â-Â I have this on worker node conf to modify the job request
>Â Â Â Â Â Â Âdisk to mentioned value but it never worked. We are using
>Â Â Â Â Â Â Âsimilar _expression_ for cpu and memory, it works fine.
>
>Â Â Â Â Â Â Â# condor_config_val MODIFY_REQUEST_EXPR_REQUESTDISK
>Â Â Â Â Â Â Â80000
>
>Â Â Â Â Â Â ÂNot sure from where it's picking this value.
>
>Â Â Â Â Â Â Â# grep -r 'Disk =' /spare/condor/dir_14*/.machine.ad
>Â Â Â Â Â Â Â<http://machine.ad>
>Â Â Â Â Â Â Â/spare/condor/dir_1417831/.machine.ad:Disk = 4278313
>Â Â Â Â Â Â Â/spare/condor/dir_1417831/.machine.ad:TotalDisk = 4278312960
>Â Â Â Â Â Â Â/spare/condor/dir_1417831/.machine.ad:TotalSlotDisk = 4278313.0
>Â Â Â Â Â Â Â/spare/condor/dir_1425169/.machine.ad:Disk = 4278313
>Â Â Â Â Â Â Â/spare/condor/dir_1425169/.machine.ad:TotalDisk = 4278312960
>Â Â Â Â Â Â Â/spare/condor/dir_1425169/.machine.ad:TotalSlotDisk = 4278313.0
>
>
>Â Â Â Â Â Â Â# du -sh /spare/condor/dir_1425169
>Â Â Â Â Â Â Â3.0G Â Â/spare/condor/dir_1425169
>
>Â Â Â Â Â Â ÂThanks & Regards,
>Â Â Â Â Â Â ÂVikrant Aggarwal
>
>Â Â Â Â ÂCAUTION: This email originated from outside of the organization.
>Â Â Â Â ÂDo not click links or open attachments unless you recognize the
>Â Â Â Â Âsender and know the content is safe.
>
>Â Â Â Â Â_______________________________________________
>Â Â Â Â ÂHTCondor-users mailing list
>Â Â Â Â ÂTo unsubscribe, send a message to
>Â Â Â Â Âhtcondor-users-request@xxxxxxxxxxx
>Â Â Â Â Â<mailto:htcondor-users-request@xxxxxxxxxxx> with a
>Â Â Â Â Âsubject: Unsubscribe
>Â Â Â Â ÂYou can also unsubscribe by visiting
>Â Â Â Â Âhttps://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>Â Â Â Â Â<https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users>
>
>Â Â Â Â ÂThe archives can be found at:
>Â Â Â Â Âhttps://lists.cs.wisc.edu/archive/htcondor-users/
>Â Â Â Â Â<https://lists.cs.wisc.edu/archive/htcondor-users/>
>
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/
_______________________________________________
HTCondor-users mailing list
To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/htcondor-users/