HTCondor Project List Archives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Condor-devel] Paper on scalability of write-ahead-logs

Date: Sun, 02 Jan 2011 10:25:05 -0500
From: Matthew Farrellee <matt@xxxxxxxxxx>
Subject: Re: [Condor-devel] Paper on scalability of write-ahead-logs

On 12/30/2010 01:43 PM, Brian Bockelman wrote:

Hi folks,

During the break, I've been reading an interesting paper from VLDB 2010:

http://infoscience.epfl.ch/record/149436/files/vldb10aether.pdf

It talks about scalability issues of write-ahead-logs in DBs, but I
found the topics relevant to the I/O scalibility issues in the Condor
schedd. Particularly, there are two concepts which may be applicable
(without knowing enough about the Condor code to determine if they are
or not):

1) Early-lock-release (ELR) and flush pipelining. ELR is simply
releasing the I/O resources prior to the I/O completing, but not
returning back to the calling routine until the I/O is finished; flush
pipelining is taking several commits and flushing them as one I/O
operation. Together, they can offer the same data guarantees while
greatly decreasing the number of small I/O operations are performed.
Note: the kernel community calls this I/O plugging. This technique works
when there are many independent transactions occurring - if there are
too many dependent transactions (transactions which can't be started
until the previous one finishes).
- Obviously, these techniques were designed for heavily-threaded
environments. It's not known to me whether the schedd can continue on
other work while it waits for a transaction to finish.
2) Asynchronous commit - i.e., lie to the user about their changes being
safely on disk. The paper uses this as an anti-pattern, and works to
show techniques such as (1) can have equivalent performance. However,
there's a reason that databases (even Oracle) allow this mode to be used
- they give the power to the user to consider the value of data
integrity and recoverability versus the value of scalability. There are
certainly times when the extra factor of 2 in scalability (making the
numbers up) is worth the cost of being able to lose 30s of status
changes. I'm not advocating that the Condor team should change their
opinions about the relevant merits, or that the default should be
changed - but this should be something the site should be allowed to
decide on.

At any rate, the paper is a good read; hope others enjoy it.

Brian

I've not read the paper, but I have read the xact code in Condor andyour email. 8o)

The xact code is structured in such a way that there are nopartial/progressive writes to disk for a transaction. The transactionexists in memory and is flushed to disk on commit[1]. Also, transactionsdo not nest, commonly exist one at a time[2], and contain no datadependencies. The structure being maintained in the log is quite simplein comparison to general DB tables.

[1] Great work has been done in this space, actually allowing for twotypes of transactions - those that must be flush()d (durable) and thosethat can wait (nondurable). Operations such as updating JobStatus aremarked as durable, while a stat update from the starter may be marked asnondurable. This cuts down on sync operations and has been a good win inthe past.

[2] The Schedd is single threaded and transactions are naturallyserialized. The only place where multiple concurrent transactions existsis from SOAP calls. Theoretically an application could expose itself toisolation issues here, but AFAIK has not happened in practice since theSOAP introduction in 2005.

Thanks for the pointer. I've always wanted to see a perf analysis of afull blown RDBMS in place of the Schedd's xact log, but I don't hold outmuch hope in finding benefits there.


Best,


matt

Prev by Date: [Condor-devel] Thoughts on cgroups-enabling Condor
Next by Date: Re: [Condor-devel] Thoughts on cgroups-enabling Condor
Previous by thread: Re: [Condor-devel] Thoughts on cgroups-enabling Condor
Next by thread: Re: [Condor-devel] Adding condor_qedit and condor_vacate features through SOAP
Index(es):
- Date
- Thread