On Feb 18, 2011, at 11:26 , Peter Doherty wrote:
I upgraded to v7.5.5 and there's one thing I'm scratching my head over.
I used to have a SPOOL directory filled with directories with names
like:
cluster15093481.proc0.subproc0.tmp/
According to the changelog I should now have dirs in the format of:
$(SPOOL)/<#>/<#>/cluster<#>.proc<#>.subproc<#>
But the thing is, I don't have anything.
my SPOOL just has:
job_queue.log
local_univ_execute
spool_version
I've got a few thousand jobs in the queue right now.
Where are the spool files? I'm sure I'm looking in the correct
directory. I've tried to find them, but I can't. I see a lot of
lock files in $(TMP_DIR)
I believe the constant I/O of all the spool files was one of the
bottlenecks of our Schedd, so if that's really been improved upon,
I'm eager to see the effect, but from reading the changelog, the only
different should have been subdirs for the spool to keep from hitting
ext3 limits.
Hmm, okay. Jobs seem to be running okay, but I see a lot of these
errors in the Shadow Log:
02/18/11 12:09:25 (pid:649) (15101845.0) (649):
Directory::setOwnerPriv() -- failed to find owner of
/raid0/gwms_schedd/spool/1845/0/cluster15101845.proc0.subproc0.tmp
02/18/11 12:09:25 (pid:649) (15101845.0) (649): Directory::Rewind():
failed to find owner of
"/raid0/gwms_schedd/spool/1845/0/cluster15101845.proc0.subproc0.tmp"
I guess that's part of the problem. I checked the perms on the spool
directory, and then I set it to 777 and verified regular users can
write to it, but that didn't stop the errors, or cause files to be
created there.
So I'm not really clear what's going on.