Hi all, after 7.8.3 update I have 4 jobs out of a dag of 8000+ stuck with: --------------------------------------------- The Requirements expression for your job is: ( ( TARGET.Memory > 0 ) && ( .RIGHT.Memory > 0 ) ) && ( TARGET.Arch == "X86_64" ) && ( TARGET.OpSys == "LINUX" ) && ( TARGET.Disk >= RequestDisk ) && ( TARGET.Memory >= RequestMemory ) && ( TARGET.FileSystemDomain == MY.FileSystemDomain ) Condition Machines Matched Suggestion --------- ---------------- ---------- 1 ( [ ].Memory > 0 ) 0 REMOVE 2 ( TARGET.Memory >= 7325 ) 0 MODIFY TO 1968 3 ( TARGET.Memory > 0 ) 32 ... --------------------------------------------- Last time condor pulled this TARGET.Memory requirement out of the ether I added "( TARGET.Memory > 0 ) && ( .RIGHT.Memory > 0 )" to job's submit file. That worked until now. The other change is I added another machine to the pool in the middle of the run -- a 2x2 AMD, but stuck jobs are not on it. What's curious this time all 4 jobs are stuck on one node and before they got stuck a whole lot of jobs successfully ran to completion on that node. The jobs are BLAST sequence searches, execute nodes are all centos 6.3 x86_64 AMDs (2..8-core), the whole setup's been running weekly for years. Any suggestions? -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu
Attachment:
signature.asc
Description: OpenPGP digital signature