Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Condor-users] Hawkeye module and condor_q problems in condor-6.6.6
- Date: Mon, 06 Jun 2005 15:13:50 -0500 (CDT)
- From: Chris Green <greenc@xxxxxxxx>
- Subject: [Condor-users] Hawkeye module and condor_q problems in condor-6.6.6
Hi,
I'm having problems with hawkeye modules under condor v6.6.6: sometimes I
get continuous: 'Cron: Job 'blah' is still running!' messages, even though
I can't find the processes in the process list any more. is there any way
to fix this short of bouncing the startd?
Second: we deal with some very large-footprint condor jobs in the vanilla
universe. Most of it (static FORTRAN array space) gets swapped out, but in
the event that a job gets killed on a machine, it will then never run on
another machine because its ImageSize is greater than the (per-vm) memory
available on the machine. I have been running a command:
condor_qedit -name lawrence -constraint \
'JobStatus == 1 && ImageSize > 0.0' \
ImageSize 0.0
which works, but condor_q then says:
-- Schedd: rockwell.fnal.gov : <131.225.52.131:32774>
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
--- ???? ---
3034.0 jocelyn 6/6 11:23 0+03:45:52 R 0 1389.5 AnalysisFramework_
--- ???? ---
--- ???? ---
3037.0 jocelyn 6/6 11:23 0+03:45:29 R 0 1389.5 AnalysisFramework_
3040.0 jocelyn 6/6 11:23 0+03:45:22 R 0 1385.5 AnalysisFramework_
3041.0 jocelyn 6/6 11:23 0+03:45:18 R 0 1106.5 AnalysisFramework_
--- ???? ---
3043.0 jocelyn 6/6 11:24 0+03:44:51 R 0 1108.5 AnalysisFramework_
3044.0 jocelyn 6/6 11:24 0+03:44:51 R 0 1106.5 AnalysisFramework_
--- ???? ---
3048.0 jocelyn 6/6 11:24 0+03:45:03 R 0 1396.0 AnalysisFramework_
where the " --- ???? --- " lines represent the jobs that were edited (job
numbers 3033.0, 3035.0, 3036.0, 3042.0 and 3047.0 here). Is this a bug or
something I did wrong? Regardless, how do I fix or workaround the problem?
Thanks,
Chris.
--
Chris Green, MiniBooNE / LANL. Email greenc@xxxxxxxx
Tel: (630) 840-2167. Fax: (630) 840-3867