I've got a dedicated Condor 6.8.4 pool using a Windows 2003 Server as central manager and a bunch of Windows XP boxes as execute nodes. Now I noticed that one of my jobs got evicted (for some reason) and won't restart since
"No resources matched request's contraints:
Check the Requirements _expression_ below:
Requirements = [...] && (Arch == "INTEL") && (Disk >= DiskUsage) && ((Memory * 1024) >= ImageSize) && (HasFileTransfer)"
This seems to be due to the automatically inserted job requirement of "((Memory * 1024) >= ImageSize)".
However, I'm not sure I understand this - my job has an
ImageSize = 530000
while all machines' classads say
Memory = 511
Obviously the machine memory is calculated in megabytes (instead of
kilobytes as stated in section 7.3 of the 6.8.4 manual) while the image size of the job is calculated in bytes - at least I can't see why my job could ever have an image size of 530 MB.
And by the way - why does an evicted vanilla job on Windows have an image size > 0 anyways???
Since there is no checkpointing on windows the job would start from scratch once it is rescheduled, wouldn't it?
And the final question: How do I get condor to restart my job? I need the job to be restarted instead of removed and resubmitted since we have built a little GUI that checks on the pool using the cluster IDs and now removing and resubmitting would leave the GUI lost...
Thanks for any help or clarification,
Thorsten