I've been running numerous java jobs under condor. Recently I ran
into a bit of a snag. A recent power outage required that most of
our dedicated compute nodes be shutdown. After the power and
condor came backup I noticed that most of my java jobs would not
start. The reason reported by condor_q's analyze is:
WARNING: Be advised:
No resources matched request's constraints
Check the Requirements expression below:
Requirements = (HasJava) && (Disk >= DiskUsage) && ((Memory *
1024) >= ImageSize) && (HasFileTransfer)
The Memory requirement seems to be responsible for preventing the
job from running. The image size for this job grow to 1.8 GB and
most of our compute nodes have only a gig of memory.
Is there anyway that I can get the jobs in the queue to restart
even if it means loosing the current image. I don't want to simply
remove the jobs currently in the queue because then I'd have to
figure out which jobs finished and which need to be restarted. I'd
rather just remove the ImageSize requirement and have the jobs
restart from scratch.
A second issue. I have many other java jobs in the queue that have
not yet run and therefor are not constrained by the Memory
requirement. Yet for some reason these jobs will not run. Here's
the output from analyze.
5913.167: Run analysis summary. Of 354 machines,
20 are rejected by your job's requirements
14 reject your job because of their own requirements
2 match, but are serving users with a better priority in the pool
26 match, but prefer another specific job despite its worse
user-priority
238 match, but will not currently preempt their existing job
54 are available to run your job
Any idea why these jobs will not pickup?
Thanks,
Jim