Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[HTCondor-users] Fwd: Job not completing when run in Out-of-Core
- Date: Fri, 03 Jan 2014 09:18:48 -0600
- From: Michael McInerny Murphy <michael.murphy@xxxxxxxxxxxxx>
- Subject: [HTCondor-users] Fwd: Job not completing when run in Out-of-Core
Hello all,
We are using condor version 7.8.8 at our office. We mainly use our compute
nodes to solve CEM problems in large batches (usually one simulation per
frequency). Condor works great most off the time when the simulations are done
in-core (in local RAM). However, when run in out-of-core modes (in local HDD)
for really large problems the simulations get "stuck" in condor. When you
condor_ssh_to_job, the simulation output file states that the simulation
completed normally. However, the job continues to be active in condor. In
the node logs the PID thread for the job never completes. These out-of-core
files are saved when the simulation completes (the impedance matrix can be
reused) and are usually over 100 GB in size. Is there a file size limit to
what condor can return? Does the fact that the JOB_SIZE is so much larger
than what it predicts at give it issues?
Thanks for your time,
Michael Murphy
Engineer
IERUS Technologies, Inc.
2904 Westcorp Blvd, Ste 210
Huntsville, AL 35805
(256) 319-2026 ext 007