Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [HTCondor-users] Limit Memory
- Date: Fri, 30 Aug 2013 09:46:34 -0500
- From: Brian Bockelman <bbockelm@xxxxxxxxxxx>
- Subject: Re: [HTCondor-users] Limit Memory
On Aug 28, 2013, at 7:18 AM, Romain <nuelromain@xxxxxxxxx> wrote:
> Romain <nuelromain@...> writes:
>
>>
>> Brian Bockelman <bbockelm <at> ...> writes:
>>
>>>
>>>
>>> On Aug 27, 2013, at 9:22 AM, Romain <nuelromain <at> ...> wrote:
>>>
>>>> Hi everybody,
>>>>
>>>> I've some problems with the limits of memory usage on my pool.
>>>>
>>>> So I've install cgroup and configure like that:
>>>> BASE_CGROUP = htcondor
>>>> CGROUP_MEMORY_LIMIT_POLICY = hard
>>>> On my configuration files (condor_config)
>>>>
>>>> I want to suspend the jobs if it stay at the limit for a time (1 min
> for
>>>> example) and go back to the queue if it stay another time more (5 min
>> for
>>>> example)
>>>>
>>>
>>> I don't understand the question. The memory limits are per-job. If you
>> suspend the job, how is it going to
>>> decrease its memory usage?
>>>
>>> Brian
>>>
>>
>> I want to suspend the job for a time and if it can't restart I want to
> stop
>> it and let go back to the queue
>>
>> If isn't possible I want to let go back to the queue directly
>>
>> I attribute 2 CPU and 1 Go RAM for each user machine, job don't have to
> take
>> more than 1Go because it can be a problem for user.
>>
>> Sorry for my bad English :s
>>
>> Thank you and have a nice day
>>
>> --
>> Romain
>>
>>
>
> To more explain my problem:
> With htop I see that the cgroup limit is respect (for example a job can use
> 500MB max).
> The "RES" column show the limit respect, but the virtual memory grow up and
> the "progress bar" (which show all memory use on the machine) grow up too
> so my limit is at 500MB but the job use more than 1.3GB with no problem so
> that can crash the machine
>
Hi Romain,
I think I understand now. Is it possible that the jobs are going into swap?
Options are:
1) Remove swap, or use the swappiness file in the /condor cgroup to remove condor's ability to use swap.
2) Set the max swap / memory usage for all of condor in the cgroup configuration.
Brian
> I just want to put back to the queue jobs which reach the limit.
>
> What I need is to find the parameter and the arguments to put on to
> configure condor to do this
>
> The priority is to save the user even if the job restart from the beginning
>
>
> Thank you
>
> --
> Romain
>
> _______________________________________________
> HTCondor-users mailing list
> To unsubscribe, send a message to htcondor-users-request@xxxxxxxxxxx with a
> subject: Unsubscribe
> You can also unsubscribe by visiting
> https://lists.cs.wisc.edu/mailman/listinfo/htcondor-users
>
> The archives can be found at:
> https://lists.cs.wisc.edu/archive/htcondor-users/