Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] dynamic slots
- Date: Tue, 27 Feb 2018 12:00:02 -0500
- From: Larry Martell <larry.martell@xxxxxxxxx>
- Subject: Re: [HTCondor-users] dynamic slots
I solved this particular issue - I had
'Requirements': '(Memory > 10000)'
When I changed it to
'request_memory': '10000'
This issue was solved. But then I ended up not using dynamic slots as
they are not doing what I need.
My need is to have condor hold jobs if there is not some amount of
memory available and submit them when memory does become available. I
have not figured out how to do that. I have another thread on the ML
about that (https://www-auth.cs.wisc.edu/lists/htcondor-users/2018-February/msg00102.shtml)
, but it has not received any replies.
On Tue, Feb 27, 2018 at 9:36 AM, John M Knoeller <johnkn@xxxxxxxxxxx> wrote:
> The condor_q -analyze output below shows that the job matches the slot, but it also shows 0 machines for all of the counters in the last clause, and
>
> No successful match recorded.
> Last failed match: Fri Feb 23 14:38:52 2018
>
> That probably indicates that the slot doesn't match the job for some reason. try running
>
> condor_q -better:reverse 38720 -machine slot1@chopin
>
> -tj
>
> -----Original Message-----
> From: HTCondor-users [mailto:htcondor-users-bounces@xxxxxxxxxxx] On Behalf Of Larry Martell
> Sent: Friday, February 23, 2018 1:47 PM
> To: HTCondor-Users Mail List <htcondor-users@xxxxxxxxxxx>
> Subject: [HTCondor-users] dynamic slots
>
> I am trying to use dynamic slots as documented here:
>
> http://research.cs.wisc.edu/htcondor/CondorWeek2012/presentations/thain-dynamic-slots.pdf
>
> I have configured 1 slot thusly:
>
> NUM_SLOTS = 1
> NUM_SLOTS_TYPE_1 = 1
> SLOT_TYPE_1 = cpus=75%
> SLOT_TYPE_1 = mem=64000
> SLOT_TYPE_1_PARTITIONABLE = true
>
> I submit a job that requires 10G of memory and it does not run:
>
> $ condor_q -better-analyze 38720
>
>
> -- Schedd: bach.elucid.local : <192.168.10.2:9618?...
> The Requirements expression for job 38720.000 is
>
> ( ( Memory >= 10000 ) ) && ( TARGET.Arch == "X86_64" ) &&
> ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) &&
> ( TARGET.Memory >= RequestMemory ) && ( TARGET.HasFileTransfer )
>
> Job 38720.000 defines the following attributes:
>
> DiskUsage = 0
> ImageSize = 0
> RequestDisk = DiskUsage
> RequestMemory = ifthenelse(MemoryUsage =!= undefined,MemoryUsage,(
> ImageSize + 1023 ) / 1024)
>
> slot1@chopin has the following attributes:
>
> TARGET.Memory = 64000
> TARGET.Arch = "X86_64"
> TARGET.Disk = 90191948
> TARGET.HasFileTransfer = true
> TARGET.OpSys = "LINUX"
>
> The Requirements expression for job 38720.000 reduces to these conditions:
>
> Slots
> Step Matched Condition
> ----- -------- ---------
> [0] 1 Memory >= 10000
> [1] 1 TARGET.Arch == "X86_64"
> [3] 1 TARGET.OpSys == "LINUX"
> [5] 1 TARGET.Disk >= RequestDisk
> [7] 1 TARGET.Memory >= RequestMemory
> [9] 1 TARGET.HasFileTransfer
>
> No successful match recorded.
> Last failed match: Fri Feb 23 14:38:52 2018
>
> Reason for last match failure: no match found
>
> 38720.000: Run analysis summary ignoring user priority. Of 1 machines,
> 0 are rejected by your job's requirements
> 0 reject your job because of their own requirements
> 0 match and are already running your jobs
> 0 match but are serving other users
> 0 are available to run your job
>
> Can anyone tell me why it's not running?