HTCondor Project List Archives



[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Condor-devel] Paper on scalability of write-ahead-logs



Hi folks,

During the break, I've been reading an interesting paper from VLDB 2010:

http://infoscience.epfl.ch/record/149436/files/vldb10aether.pdf

It talks about scalability issues of write-ahead-logs in DBs, but I found the topics relevant to the I/O scalibility issues in the Condor schedd.  Particularly, there are two concepts which may be applicable (without knowing enough about the Condor code to determine if they are or not):

1) Early-lock-release (ELR) and flush pipelining.  ELR is simply releasing the I/O resources prior to the I/O completing, but not returning back to the calling routine until the I/O is finished; flush pipelining is taking several commits and flushing them as one I/O operation.  Together, they can offer the same data guarantees while greatly decreasing the number of small I/O operations are performed.  Note: the kernel community calls this I/O plugging.  This technique works when there are many independent transactions occurring - if there are too many dependent transactions (transactions which can't be started until the previous one finishes).
  - Obviously, these techniques were designed for heavily-threaded environments.  It's not known to me whether the schedd can continue on other work while it waits for a transaction to finish.
2) Asynchronous commit - i.e., lie to the user about their changes being safely on disk.  The paper uses this as an anti-pattern, and works to show techniques such as (1) can have equivalent performance.  However, there's a reason that databases (even Oracle) allow this mode to be used - they give the power to the user to consider the value of data integrity  and recoverability versus the value of scalability.  There are certainly times when the extra factor of 2 in scalability (making the numbers up) is worth the cost of being able to lose 30s of status changes.  I'm not advocating that the Condor team should change their opinions about the relevant merits, or that the default should be changed - but this should be something the site should be allowed to decide on.

At any rate, the paper is a good read; hope others enjoy it.

Brian

Attachment: smime.p7s
Description: S/MIME cryptographic signature