Sstable and Log structured storage:leveldb

Source: Internet
Author: User

If Protocol buffers is the lingua franca of individual data record at Google and then the Sorted String Table ( SSTable ) is one Of the most popular outputs for storing, processing, and exchanging datasets. As the name itself implies, an sstable are a simple abstraction to efficiently store large numbers of key-value pairs W Hile optimizing for high throughput, sequential read/write workloads.

Unfortunately, the sstable name itself have also been overloaded by the industry to refer to services so go well beyond J UST the sorted table, which have only added unnecessary confusion to what are a very simple and a useful data structure on I TS own. Let's take a closer look under the hood of an sstable and what LevelDB makes use of it.

sstable:sorted String Table

Imagine we need to process a large workload where the input was in gigabytes or terabytes in size. Additionally, we need to run multiple steps on it, which must is performed by different binaries-in other words, imagine We are running a sequence of map-reduce jobs! Due to size of input, reading and writing data can dominate the running time. Hence, Random reads and writes is not a option, instead we'll want to stream the data in and once we ' re-done, flush it Back to disk as a streaming operation. This, we can amortize the disk I/O costs. Nothing revolutionary, moving right along.

A "Sorted String Table" then was exactly what it sounds like, it's a file which contains a set of arbitrary, Sorted Key-va Lue pairs inside. Duplicate Keys Fine, there is no need for "padding" for keys or values, and keys and values are arbitrary blobs. Read in the entire file sequentially and you have a sorted index. Optionally, if the file is very large, we can also prepend, or create a standalone key:offset index for fast access. That's a sstable is:very simple, but also a very useful the-to exchange large, sorted data segments.

Sstable and bigtable:fast random access?

Once an sstable was on disk it was effectively immutable because an insert or delete would require a large I/O rewrite of th E file. Have said that, for static indexes It's a great solution:read in the index, and all always one disk seek away, or Simply the memmap entire file to memory. The Random reads is fast and easy.

Random writes was much harder and expensive, that's, unless the entire table is in memory, in which case we ' re back to Si Mple pointer manipulation. Turns out, this is the very problem that Google's BigTable set out to Solve:fast read/write access for Petabyte datasets In size, backed by sstables underneath. How does they do it?

Sstables and Log structured Merge Trees

We want to preserve the fast read access which sstables give us, but we also want to support fast random writes. Turns out, we already has all the necessary Pieces:random writes be fast when the sstable was in memory (let's call it c0/>), and if the table is immutable then a on-disk sstable is also fast to read from. Now let ' s introduce the following conventions:

    1. On-disk SSTable indexes is always loaded into memory
    2. All writes go directly to the MemTable index
    3. Reads Check the memtable first and then the sstable indexes
    4. Periodically, the memtable is flushed to disk as an sstable
    5. Periodically, On-disk Sstables is "collapsed together"

What do we do here? Writes is always do in memory and hence is always fast. Once MemTable the reaches a certain size, it is flushed to disk as an immutable SSTable . However, we'll maintain all the sstable indexes in memory, which means if for any read we can check the memtable first , and then walk the sequence of sstable indexes to find our data. Turns out, we had just reinvented the "the Log-structured Merge-tree" (LSM Tree), described by Patrick O ' Neil, a nd This is also the very mechanism behind "BigTable Tablets".

LSM & Sstables:updates, deletes and maintenance

This "LSM" architecture provides a number of interesting behaviors: writes is always fast regardless of the size of datasets (Append-only), and random reads is either served from memory or require a quick disk seek. However, what's about updates and deletes?

Once the sstable is on disk, it's immutable, hence updates and deletes can ' t touch the data. Instead, a more recent value was simply stored in case MemTable of update, and a ' tombstone ' record is appended fo R deletes. Because We check the indexes in sequence, future reads'll find the updated or the tombstone record without ever reaching The older values! Finally, have hundreds of On-disk sstables is also not a great idea, hence periodically we'll run a process to Mer GE the On-disk sstables, at which time the update and delete records would overwrite and remove the older data.

Sstables and LevelDB

Take an SSTable , add a and MemTable apply a set of processing conventions and what are you get a nice database engine for Certai n type of workloads. In fact, Google's BigTable, Hadoop's HBase, and Cassandra amongst others is all using a Variant or a direct copy of the Very architecture.

Simple on the surface, but as usual, implementation details matter a great deal. Thankfully, Jeff Dean and Sanjay Ghemawat, the original contributors to the sstable and BigTable infrastructure at Google Released LevelDB earlier last year, which was more or less an exact replica of the architecture we ' ve described above:

    • Sstable under the hood, memtable for writes
    • Keys and values are arbitrary byte arrays
    • Support for Put, Get, Delete operations
    • Forward and backward iteration over data
    • Built-in Snappy compression

Designed to being the engine for Indexdb in WebKit (aka, embedded in your browser), it's easy-to-embed, fast, and Best of Al L, takes care of the sstable and memtable Flushing, merging and other gnarly details.

Working with Leveldb:ruby

LevelDB is a library with a standalone server or service-although you could easily implement one on top. To get started, grab your favorite language bindings (Ruby), and let's see how we can do:

Require' Leveldb '# Gem Install Leveldb-rubyDb=LevelDB::Db.New"/tmp/db"Db.Put"B","Bar"Db.PutA "foo" db. Put  "C"  "baz" puts db get  "a" # = Foodb. Each do | K,v| p [k,v # = ["A", "foo"], ["B", "Bar"], ["C", "Baz"]enddb.< span class= "n" >to_a # = [["A", "foo"], ["B", "Bar"], ["C", "Baz"]]     span>                 

We can store keys, retrieve them, and perform a range scan all with a few lines of code. The mechanics of maintaining the memtables, merging the Sstables, and the rest are taken care for us by Leveldb-nice and Simple.

LevelDB in WebKit and Beyond

Sstable is a very simple and useful data structure-a great bulk input/output format. However, what makes the sstable fast (sorted and immutable) are also what exposes it very limitations. To address this, we ' ve introduced the idea of a memtable, and a set of "log structured" processing conventions for Managin G The many sstables.

All simple rules, but as always, implementation details matter, which are why LevelDB are such a nice addition to the Open-s Ource Database engine stack. Chances is, you'll soon find LevelDB embedded in your browser, on your phone, and in many other places. Check out the LevelDB source, scan the docs, and take it for a spin.

Sstable and Log structured storage:leveldb

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.