Hi Vikrant, how does your storage set up looks like? My guess would be that 18446744073692897281is a bit large, so that the partitionable parent slot maybe has an overflow or so, but that the partitioned slots are cut out properly.
Cheers, Thomas On 06/06/2023 14.53, Vikrant Aggarwal wrote:
Hello Experts, Any input on disk issues? Thanks & Regards, Vikrant AggarwalOn Sat, Jun 3, 2023 at 6:28âPM Vikrant Aggarwal <ervikrant06@xxxxxxxxx <mailto:ervikrant06@xxxxxxxxx>> wrote:Hello Tomer, Thanks for sharing the configuration. it helps to put the job on hold breaching the requestdisk. We have a problem in our infra where people don't ask for the request disk in job spec hence I want to modify it on a worker machine based on some logic related to CPUs. I am seeing strange behavior. RequestDisk will remain intact whatever we put in the job submit file 2GB but I Âcouldn't understand where it's picking the Disk attribute. By default it's ~ 4GB # condor_who -af:h globaljobid disk DiskUsage TotalDisk TotalSlotDisk RequestDiskglobaljobid                    Âdisk DiskUsage TotalDisk ÂTotalSlotDisk     RequestDisktest.example.com#429.0#1685829846<http://test.example.com#429.0%231685829846> 4271297 Â27 Â4271296648 4271297.0       2097152Attempt 1 : Try to modify the RequestDisk to 4GB but it becomes 8GB - May be addition of default 4GB MODIFY_REQUEST_EXPR_REQUESTDISK = 4194304globaljobid                    Âdisk DiskUsage TotalDisk ÂTotalSlotDisk     RequestDisktest.example.com#430.0#1685830072<http://test.example.com#430.0%231685830072> 8542594 Â27 Â4271296648 8542594.0       2097152Attempt 2 : Try to modify the RequestDisk to 6GB but it becomes 8GB - If we go by 4GB addition logic it should have been 10GB MODIFY_REQUEST_EXPR_REQUESTDISK = 6291456globaljobid                    Âdisk DiskUsage TotalDisk ÂTotalSlotDisk     RequestDisktest.example.com#431.0#1685830179<http://test.example.com#431.0%231685830179> 8542594 Â2 4271296648 8542594.0       2097152Attempt 3 : Try to modify the RequestDisk to 8GB as expected it becomes 12GB. MODIFY_REQUEST_EXPR_REQUESTDISK = 8388608globaljobid                    Âdisk DiskUsage TotalDisk ÂTotalSlotDisk     RequestDisktest.example.com#428.0#1685829703<http://test.example.com#428.0%231685829703> 12813890 8192027 4271296648 12813890.0      Â2097152Attempt 4 : Try to modify the disk size to 1GB. it retains 4GB size. MODIFY_REQUEST_EXPR_REQUESTDISK = 1048576globaljobid                    Âdisk DiskUsage TotalDisk ÂTotalSlotDisk     RequestDisktest.example.com#432.0#1685830887<http://test.example.com#432.0%231685830887> 4271297 Â2 4271296648 4271297.0       2097152Command used to grab outputs: condor_who -af:h globaljobid disk DiskUsage TotalDisk TotalSlotDisk RequestDisk Finally more confusion with negative disk values in following output: # condor_status `hostname` -serverName                      OpSys    Arch LoadAv Memory  Disk   ÂMips  ÂKFlopsslot1@xxxxxxxxxxxxxxxxxxxxxxxxxx <mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxxxx>  LINUX    X86_64 Â0.000  172962 -57841021  22492  1705677 slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxx <mailto:slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX    X86_64 Â0.000  Â19218 Â12813890  22492  1705677 slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxx <mailto:slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX    X86_64 Â0.000  Â19218  8542594  22492  1705677 slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxx <mailto:slot1_3@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX    X86_64 Â0.000  Â19218  8542594  22492  1705677 slot1_4@xxxxxxxxxxxxxxxxxxxxxxxxxx <mailto:slot1_4@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX    X86_64 Â0.000  Â19218  4271297  22492  1705677       ÂMachines Avail ÂMemory    ÂDisk    ÂMIPS ÂKFLOPS X86_64/LINUX    Â5   5   Â249834 18446744073685880970 Â112460   8528385    ÂTotal    Â5   5   Â249834 18446744073685880970 Â112460   8528385Questions: - From where it's picking the default 4GB Disk size? - Why is it setting Disk size to different values than what we ask in the modify expression? - Why in -server output we see negative disk value. htcondor version : 9.0.17 Regards, Vikrant Aggarwal On Thu, 1 Jun, 2023, 09:38 Tomer Pearl, <tomerp@xxxxxxxxxxx <mailto:tomerp@xxxxxxxxxxx>> wrote: Hi Vikrant, The following configuration works for me. Not sure which version I'm running, should be 9+. STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) RequestDisk DISK_USAGE_EXCEEDED = (JobUniverse !=13 && DiskUsage =!= UNDEFINED && DiskUsage > RequestDisk) *use POLICY: *WANT_HOLD*_IF* = (DISK_USAGE_EXCEEDED, 105, my error string..). Not sure if /my error string../ should be surroundedÂby quotation marks, as I'm templating the file with Jinja. Tomer. ------------------------------------------------------------------------ *From:* HTCondor-users <htcondor-users-bounces@xxxxxxxxxxx <mailto:htcondor-users-bounces@xxxxxxxxxxx>> on behalf of Vikrant Aggarwal <ervikrant06@xxxxxxxxx <mailto:ervikrant06@xxxxxxxxx>> *Sent:* Thursday, June 1, 2023 12:44 AM *To:* HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx <mailto:htcondor-users@xxxxxxxxxxx>> *Subject:* Re: [HTCondor-users] HTcondor disk resource related queries Hello Experts, I am testing this configuration to put the jobs on hold breaching the disk limit. STARTD_JOB_ATTRS = $(STARTD_JOB_ATTRS) RequestDisk DISK_USAGE_EXCEEDED = (JobUniverse =!=13 && DiskUsage =!= UNDEFINED && DiskUsage > RequestDisk) WANT_HOLD = $(DISK_USAGE_EXCEEDED) WANT_HOLD_REASON = "Job exceeded disk usage limits" I clearly see the jobs are using more than RequestDisk size still they are not getting held. # condor_who -af:h globaljobid disk DiskUsage TotalDisk TotalSlotDisk RequestDiskglobaljobid                    Âdisk DiskUsage TotalDisk ÂTotalSlotDisk     RequestDisktest.example.com#412.0#1685567906<http://test.example.com#412.0%231685567906> 21356484 8192026 4271296648 21356484.0      Â16777216test.example.com#413.0#1685567923<http://test.example.com#413.0%231685567923> 12813890 8192026 4271296648 12813890.0      Â8388608test.example.com#414.0#1685567952<http://test.example.com#414.0%231685567952> 8542594 Â8192026 4271296648 8542594.0       3250000test.example.com#415.0#1685568493<http://test.example.com#415.0%231685568493> 8542594 Â8192025 4271296648 8542594.0       3250000test.example.com#416.0#1685568803<http://test.example.com#416.0%231685568803> 12813890 8192026 4271296648 12813890.0      Â10000000test.example.com#417.0#1685568954<http://test.example.com#417.0%231685568954> 4271297 Â8192025 4271296648 4271297.0       19.0.17 is htcondor version I am using. Thanks & Regards, Vikrant Aggarwal On Tue, May 30, 2023 at 1:09âPM Vikrant Aggarwal <ervikrant06@xxxxxxxxx <mailto:ervikrant06@xxxxxxxxx>> wrote: Hello Experts, Couple of queries: - Why it's showing negative value for primary partitionable slot. # condor_status `hostname` -serverName                      OpSys Arch  LoadAv Memory  Disk   ÂMips  ÂKFlopsslot1@xxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot1@xxxxxxxxxxxxxxxxxxxxxxxxxx>  LINUX X86_64 Â0.000  211398 -25210961  25601  1764976slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot1_1@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX X86_64 Â0.000  Â19218  4278313  25601  1764976slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxx<mailto:slot1_2@xxxxxxxxxxxxxxxxxxxxxxxxxx> LINUX X86_64 Â0.000  Â19218  4278313  25601  1764976       ÂMachines Avail ÂMemory    ÂDisk ÂMIPS   ÂKFLOPS X86_64/LINUX    Â3   3   Â249834 18446744073692897281    76803   5294928     ÂTotal    Â3   3   Â249834 18446744073692897281    76803   5294928 # condor_status -compact `hostname` -af Disk 4269756335 - I have this on worker node conf to modify the job request disk to mentioned value but it never worked. We are using similar expression for cpu and memory, it works fine. # condor_config_val MODIFY_REQUEST_EXPR_REQUESTDISK 80000 Not sure from where it's picking this value. # grep -r 'Disk =' /spare/condor/dir_14*/.machine.ad <http://machine.ad> /spare/condor/dir_1417831/.machine.ad:Disk = 4278313 /spare/condor/dir_1417831/.machine.ad:TotalDisk = 4278312960 /spare/condor/dir_1417831/.machine.ad:TotalSlotDisk = 4278313.0 /spare/condor/dir_1425169/.machine.ad:Disk = 4278313 /spare/condor/dir_1425169/.machine.ad:TotalDisk = 4278312960 /spare/condor/dir_1425169/.machine.ad:TotalSlotDisk = 4278313.0 # du -sh /spare/condor/dir_1425169 3.0G  Â/spare/condor/dir_1425169 Thanks & Regards, Vikrant Aggarwal CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe. _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx <mailto:htcondor-users-request@xxxxxxxxxxx> with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users <https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users> The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/ <https://lists.cs.wisc.edu/archive/htcondor-users/> _______________________________________________ HTCondor-users mailing list To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a subject: Unsubscribe You can also unsubscribe by visiting https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users The archives can be found at: https://lists.cs.wisc.edu/archive/htcondor-users/
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature