If you stop and restart condor, the existing contents of your job
queue
are preserved. For vanilla universe jobs with sufficient lease times
(20 minutes by default), the schedd should be able to reconnect to
running jobs after it restarts without the jobs being interrupted by
the
restart.
Is your schedd logging anything strange? Is it responding to
condor_q?
--Dan
Robert E. Parrott wrote:
Hi Folks,
I have somewhat of an emergency situation.
After a DOS attempt on ssh on our login node, and subsequent system
unresponsiveness an then reboot, the condor_schedd process now grows
to exhaust all of physical memory on the host (6 GB + at present).
This causes swapping issues etc. ... it ain't pretty.
Is there some appraoch I can take to try to resolve this issue,
without losing the previous queue of submitted jobs?
thanks in advance for the quick response,
rob
==========================
Robert E. Parrott, Ph.D. (Phys. '06)
Associate Director, Grid and
Supercomputing Platforms
Project Manager, CrimsonGrid Initiative
Harvard University Sch. of Eng. and App. Sci.
Maxwell-Dworkin 211,
33 Oxford St.
Cambridge, MA 02138
(617)-495-5045
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/
_______________________________________________
Condor-users mailing list
To unsubscribe, send a message to condor-users-request@xxxxxxxxxxx
with a
subject: Unsubscribe
You can also unsubscribe by visiting
https://lists.cs.wisc.edu/mailman/listinfo/condor-users
The archives can be found at:
https://lists.cs.wisc.edu/archive/condor-users/