[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-users] Greetings, and rtgen over Condor



On 7/1/05, Miguel Dilaj <mdilaj@xxxxxxxxxxxxx> wrote:
> Hi all,
> 
> I'm an absolutely new starter with Condor (5 hours!).

fresh meat :)

> I had Condor 6.6.10 for Win up and running in a few pilot boxes in no time,
> and run rtgen using a .sub adapted from the rc5.sub in the examples.

This is a Good Idea - I think anyone with a problem on an initial pool
setup should try to submit printname and cpusoak before asking for
help :)

> I'm particularly interested in knowing if there're any
> drawbacks of using Condor in this scenario, where biiiiig files are
> generated on each node.

There are a few. For starters you should ensure any large files which
aren't needed back on the submitter are deleted or created in a sub
directory of the initial dir since these are not transferred
automatically.

Second if you are supporting manual checkpointing it is quite possible
to blitz your schedd with a mass eviction (I have seen a dual Xeon
machine with scsi disks and 100M network annihilated by 40 evictions
each with about 1/2 Gig of data).
There are mechanisms to slow the starting of jobs but not the pace of
eviction. IF you want the evictions to succeed with significant
amounts of data I strongly suggest increasing the KILL setting to
allow up to 30minutes!

The space for the execute directory would ideally be allocated on a
sizable disk separate from the main OS install. The significant extra
load on the disk may well increase the chance of failure as well as
potentially annoying the provider if it fills up.

Best of luck!

Matt