[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-users] Memory issues when running condor jobs:



Hi all,

I am running a very memory-intensive job via Condor and every time when the virtual memory size goes beyond 1.6GB or so, the job was evicted and never picked up by anyone (condor_q -analyze shows that all machines that qualifies to run the program reject to do so). I understand it may have something to do with machine's local job policy, where when the image size of my job exceeds certain threshold it gets evicted.

One obvious solution on my end is to limit the memory footprint of this program. But this is a "naive" approach that someone proposed in a paper and we are proposing something else to beat it. We are trying to get some data points on some non-trivial dataset to show that we indeed beat it in terms of both efficiency and quality. The problem is that in the "naive" approach its memory footprint can easily go beyond 2-3 GB, which my 32bit workstation cannot handle due to limits of address space (allocation error). I don't have access to a 64 bit machine so that is why I am using Condor to start with.

I am just wondering, without rewriting current implementation (potentially by moving some data to disk and removing them from in-mem structure), is there any workaround to this problem?

Any suggestions are highly appreciated!
-Yeye