Mailing List Archives
Authenticated access
|
|
|
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[condor-users] Job cluster management?
- Date: Mon, 8 Dec 2003 10:03:43 -0800
- From: "Michael S. Root" <mike@xxxxxxxxxxxxxx>
- Subject: [condor-users] Job cluster management?
Hi all. I work for a small visual effects company in San Francisco, and
we're trying out Condor as a means of distributing renders across our
available machines. Heretofore, we have used an extremely minimal tool I
wrote in Python to accomplish this. It worked OK for what it was, but we
have now outgrown it. I've now gotten Condor installed and working
without too much fuss.
My question is this: How do other Condor users manage their job clusters?
Unless I'm missing something, it seems difficult to get a handle on the
status of a job cluster as a whole using Condor's tools out-of-the-box.
For example, say we have a 150-frame shot, with three shadow maps needing
to be generated for each frame. That comes out to 600 jobs, which may
take anywhere from a minute to an hour or more per job. If I have that
and a couple of other things in the queue, the output from condor_q
quickly becomes difficult to make sense of. It'd be great if there were
an option for many of the condor tools (-cluster, for example) that would
return information at the cluster level. For example:
ID OWNER SUBMITTED RUN_TIME ST PRI SIZE CMD
18.1 mike 12/8 09:22 0+00:00:01 R 0 0.0 shake
18.2 mike 12/8 09:22 0+00:00:00 R 0 0.0 shake
18.3 mike 12/8 09:22 0+00:00:00 R 0 0.0 shake
.
.
.
becomes:
ID OWNER SUBMITTED TOTAL_RUN_TIME ST CMPLTE CMD
18.* mike 12/8 08:32 0+00:41:01 R 52/100 shake
19.* mike 12/8 09:22 0+00:00:00 I 0/50 maya
20.* mike 12/8 09:44 0+00:00:00 I 0/600 prman
Clearly, I could write a wrapper script for condor_q that would do
something like this, but it'd be nice if it were built-in.
My biggest problem is with email notification. Obviously, getting hundreds
of emails a day is too much. On the other hand, getting AN email when a
cluster finishes is really handy. I know could do this with a DAG, and
have a job that sends an email to the submitter when all other jobs in the
cluster have finished. But, it's a pain to have to make a dag just for
that, and we lose some of the nice runtime stats that the built-in email
notification gives. Hey developers, any chance of getting a
"Notification=Cluster" setting for condor_submit?
Are there any other Condor users out there who have had similar issues?
Anybody come up with any good solutions?
Also for the developers: Any chance of getting a Python module?
Cheers.
-Mike
mike@xxxxxxxxxxxxxx
Condor Support Information:
http://www.cs.wisc.edu/condor/condor-support/
To Unsubscribe, send mail to majordomo@xxxxxxxxxxx with
unsubscribe condor-users <your_email_address>