Yeye, Does the application finish on the machine if it is run manually by itself (with no other jobs)? It is unlikely the machine is preempting the job (but we could check your PREEMPT _expression_ to double check), and more likely that the job is failing because it uses more memory than the 32-bit address space, and never re-matching. We've seen jobs that fail when they get to large a memory footprint, and then because the ImageSize for the job has been updated to the larger 2-3GB number, it never reschedules because no slots have that memory available. Steps to test this are to manually run the job on the machine in question while nothing else is running, to see if it completes successfully. If the job does run when nothing else is running on the machine, you might decrease the number of slots on the machine, so each slot has more RAM. If the job doesn't run because it runs out of memory in the 32-bit address space, Condor won't change that because it merely schedules jobs. Otherwise, you might find that the job fails for a reason other than memory. Hope this helps! Best, Doug -- =================================== Douglas Clayton main: 888.292.5320 Cycle Computing, LLC Leader in Condor Grid Solutions Enterprise Condor Support and Management Tools On Oct 15, 2008, at 1:22 AM, Yeye He wrote:
-- =================================== Douglas Clayton phone: 919.647.9648 Cycle Computing, LLC Leader in Condor Grid Solutions Enterprise Condor Support and Management Tools http://www.cyclecomputing.com |