Hi all,
after 7.8.3 update I have 4 jobs out of a dag of 8000+ stuck with:
---------------------------------------------
The Requirements expression for your job is:
( ( TARGET.Memory > 0 ) && ( .RIGHT.Memory > 0 ) ) &&
( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) &&
( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) &&
( TARGET.FileSystemDomain == MY.FileSystemDomain )
Condition Machines Matched Suggestion
--------- ---------------- ----------
1 (
[
].Memory > 0 ) 0 REMOVE
2 ( TARGET.Memory >= 7325 ) 0 MODIFY TO 1968
3 ( TARGET.Memory > 0 ) 32
...
---------------------------------------------
Last time condor pulled this TARGET.Memory requirement out of the ether
I added "( TARGET.Memory > 0 ) && ( .RIGHT.Memory > 0 )" to job's submit
file. That worked until now.
The other change is I added another machine to the pool in the middle of
the run -- a 2x2 AMD, but stuck jobs are not on it.
What's curious this time all 4 jobs are stuck on one node and before
they got stuck a whole lot of jobs successfully ran to completion on
that node.
The jobs are BLAST sequence searches, execute nodes are all centos 6.3
x86_64 AMDs (2..8-core), the whole setup's been running weekly for years.
Any suggestions?
--
Dimitri Maziuk
Programmer/sysadmin
BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature