[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Condor Webservice spool cleanup



Joshua.Chartier@xxxxxxxxxxxx wrote:
Tessella Ref: NPD/5350/PR/ICSC/2008FEB07/15:11:05

Hello Condor Users,

We have a condor setup about to go into production on a customer site. We've been in testing for a couple months and everything seems to be going well except we have about 100,000+ files that have build up in the spool directory of our condor host machines. Our condor jobs are being submitted via the webservice into the java universe and are running properly. After the jobs have completed and sometimes a little before we are calling the birdbath libraries with the following commands.

            Schedd schedd = new Schedd(new URL(IP));
            Transaction xact = schedd.createTransaction();
            xact.begin(30);
            // Clean up job
            xact.closeSpool(cluster, job);
            xact.removeJob(cluster, job, "Finished with job");
            xact.commit();

The jobs are actually beginning removed from the queue, but they are not being deleted from the spool. Not only that, but I can't delete any of the files/folders in the spool via condor_preen or by hand unless I have restarted condor since they've been removed. Has anyone seen this behavior before? Or have any ideas about how we can clean up these files? Our condors' version is 6.8.6 and we are running them on Windows Server 2003.

Thanks,
Josh

Josh,

On Windows, when a file is open by a process it cannot be deleted from the filesystem. This suggests the Schedd is leaking open files. Can you use a tool like FileMon[1] to see what files, if any, are held open by your Schedd. Check after you've submitted some jobs and discover you cannot remove the spool directories.

Also, when you are interacting with the Schedd via the WS interface, do you receive any exceptions during calls to DeclareFile or GetFile?

Best,


matt

[1] http://technet.microsoft.com/en-us/sysinternals/bb896642.aspx