[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [HTCondor-users] Some jobs from same cluster won't run



On 4/16/2015 1:46 AM, Steffen Grunewald wrote: I

But the jobs apparently had a bigger memory footprint at the time
of preemption, and no slot with 1221 (? typing this off my memory) MB
is currently available (-better-analyze seems to suggest that the
maximum currently is in the 900ish region).
Accesses involving copy on write (for one) tend to be over-reported by 
linux kernel, and that's what condor sees (and will auto-insert if you 
don't explicitly set request_memory). If you're not using cgroups or 
immediate allocation, the node should swap. If your numbers are correct, 
a 900ish node will need 300MB of swap to run a 1221ish job. You may not 
want to hit the swap, but that's a different issue -- if you'd rather 
have the job run, albeit slowly, request less than 1221MB memory.
Dimitri