[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] HELP: condor_schedd memory leak???

Date: Mon, 7 Apr 2008 12:48:42 -0400
From: "Robert E. Parrott" <parrott@xxxxxxxxxxxxxxxx>
Subject: Re: [Condor-users] HELP: condor_schedd memory leak???

I have stopped and restarted the condor_schedd multiple times, andeach time I get the same behavior.Essentially, it goes through its cycle, then the memory use sky-rockets and once it hits swap brings the host to its knees. There isno error reported in the logs.

It looks like a beahvior that would result from a corrupt data file,or similar, since it happens in a consistent fashion.

BTW, the version is 7.0.1 on this node, but the compute nodes are6.8.5 (I think).


rob


On Apr 7, 2008, at 12:41 PM, Dan Bradley wrote:

If you stop and restart condor, the existing contents of your jobqueue

are preserved.  For vanilla universe jobs with sufficient lease times
(20 minutes by default), the schedd should be able to reconnect to

running jobs after it restarts without the jobs being interrupted bythe

restart.

Is your schedd logging anything strange? Is it responding tocondor_q?


--Dan

Robert E. Parrott wrote:

Hi Folks,

I have somewhat of an emergency situation.

After a DOS attempt on ssh on our login node, and subsequent system
unresponsiveness an then reboot, the condor_schedd process now grows
to exhaust all of physical memory on the host (6 GB + at present).
This causes swapping issues etc.  ... it ain't pretty.

Is there some appraoch I can take to try to resolve this issue,
without losing the previous queue of submitted jobs?

thanks in advance for the quick response,
rob


==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Associate Director, Grid and
      Supercomputing Platforms
Project Manager, CrimsonGrid Initiative
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045

_______________________________________________
Condor-users mailing list

To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxwith a

subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/

_______________________________________________
Condor-users mailing list

To unsubscribe, send a message to condor-users-request@xxxxxxxxxxxwith a

subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users

The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/



==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Associate Director, Grid and
       Supercomputing Platforms
Project Manager, CrimsonGrid Initiative
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin  211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045

References:
- [Condor-users] HELP: condor_schedd memory leak???
  - From: Robert E. Parrott
- Re: [Condor-users] HELP: condor_schedd memory leak???
  - From: Dan Bradley

Prev by Date: [Condor-users] [web service] Job submission from C#
Next by Date: Re: [Condor-users] [web service] Job submission from C#
Previous by thread: Re: [Condor-users] HELP: condor_schedd memory leak???
Next by thread: [Condor-users] limit the transfer speed of results
Index(es):
- Date
- Thread