Ian Chesal wrote:
On Thu, Dec 2, 2010 at 1:13 PM, Xenia Fave <xfave2008@xxxxxxxxxx <mailto:xfave2008@xxxxxxxxxx>> wrote:Do you mean just rebooting the one node or the entire cluster? Just the one node where Condor won't start.See the other email from James Burnash about fsck'ing the file system -- in order to do this you'll have to unmount it from *all* your machines.
If it's mounted on other machines: looks like everyone has a local /scratch.As I recall (haven't seen it in a while) this can error happen when the disk develops too many bad sectors too fast. Then the filesystem gets ro'ed at a lower level than mtab, so mount still shows it as "rw". If that is the case, smartctl and/or dmesg (or /var/log/messages) should have something to say about it. Also, if this is the cause of the problem, don't bother with fsck, replace the disk.
Dimitri -- Dimitri Maziuk Programmer/sysadmin BioMagResBank, UW-Madison -- http://www.bmrb.wisc.edu