[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] BLAST jobs go to 0% CPU; condor thinks they'rerunning



Michael Rusch wrote:

I don't know what you mean by your question: are the jobs still alive when
the CPU drops to 0%.  The processes still exist, as I can see them using
Task Manager (I'm in Windows XP--no ps command), but they never get any CPU
time.

But, the good news is that it's working now.  Why?  I have no idea.  After
having these problems, I switched a couple of the machines to the UWCS
default settings for starting, suspending, preempting jobs, etc.  I screwed
up one of them pretty badly, which made startd crash constantly, so that
that node disappeared from the pool.  After that, running the BLAST worked
fine.  When I fixed the config script and the node came back, it still
worked fine.  I had not modified the config on that node at all when it
wasn't working...it was the same as the rest, but after breaking it and
fixing it, it worked.

Go figure.

Michael.



I don't know anything abotu the Windows version of Condor's detection of user activity on the machine, or its ability to suspend jobs temporarily.
I am guessing somehow Windows detected the machine was not idle (mayeb you logged in to administrate the machine?) and
suspended the job.


With UNIX, you can typically run ps and see the job in the T state (stopped) when condor has suspended it. Microsoft makes "Services for UNIX" and distributes it on their web site. It includes a win32 ps command which may tell you whether the job has been stopped.

Dave