Boltdb a simple pure Go key/value storage [Translate]

Source: Internet
Author: User
Tags value store
This is a creation in Article, where the information may have evolved or changed.

Blot

Bolt is a pure Go Key/value database inspired by Howard Chu's LMDB project. The goal of this project is to provide a simple, fast and reliable database for projects that do not require a full database server (such as Postgres or MySQL).

Since Bolt is used as such a low-level function, simplicity is key. The API will be small, focusing only on getting values and setting values.

Project status

Blot stable, API fixed, file format fixed. Use full unit test coverage and random black-box testing to ensure database consistency and thread safety. The Blot is currently used in high-load production environments up to 1TB. Many companies, such as Shopify and Heroku, use bolts every day to support their services.

A Message from the author

The initial goal of Bolt is to provide a simple, pure Go Key/value store without the extra features of the code. To this end, the project has been successful. However, this limited scope also means that the project has been completed.

Maintaining an open source database requires a lot of time and effort. Changes to the code can have unintended or even catastrophic effects, so even simple changes require hours of careful testing and validation.

Unfortunately, I no longer have the time or energy to continue the work. Blot is in stable condition, and has many years of successful production use. So I think putting it in the present state is the most prudent approach.

If you are interested in using a more distinctive bolt version, I suggest you look at the CoreOS fork named Bbolt.

Getting Started

Installation

In order to use Blot, first install the GO environment and then execute the following command:

$ go get github.com/boltdb/bolt/...

This command retrieves the library and installs the blot executable file into the $GOBIN path.

Open Blotdb

The top-level object in the bolt is a db. It is represented as a single file on disk, representing a consistent snapshot of the data.

To open a database, simply use the bolt.Open() function:

package mainimport (    "log"    "github.com/boltdb/bolt")func main() {    // Open the my.db data file in your current directory.    // It will be created if it doesn't exist.    db, err := bolt.Open("my.db", 0600, nil)    if err != nil {        log.Fatal(err)    }    defer db.Close()    ...}

Please note:
Bolt obtains a file lock on the data file, so multiple processes cannot open the same database at the same time. Opening a Bolt database that is already open causes it to hang until another process closes it. To prevent waiting indefinitely, you can pass the timeout option to the Open() function:

db, err := bolt.Open("my.db", 0600, &bolt.Options{Timeout: 1 * time.Second})

Transaction

The Bolt allows only one read-write transaction at a time, but allows multiple read-only transactions at a time. Each transaction has a consistent view of the data.

Individual transactions and all objects created from them, such as Bucket,key, are not thread-safe. To process data in multiple goroutine, you must start a transaction for each goroutine, or use a lock to ensure that only one Goroutine access transaction at a time. Creating a transaction from DB is thread-safe.

Read-only transactions and read-write transactions should not be dependent on each other and should not normally be opened simultaneously in the same routine. This can lead to deadlocks because read and write transactions require periodic remapping of the data file, but only if the read-only transaction is open.

Read and Write transactions

To start a read-write thing, you can use the DB.Update() function:

err := db.Update(func(tx *bolt.Tx) error {    ...    return nil})

Inside the closure, you have a consistent view of the database. You complete the transaction by returning to zero. You can also roll back a transaction at any time by returning an error. All database operations are allowed in a read-write transaction.

Always check for return errors, as it will report any disk failures that could cause your transaction to not complete. If you return an error in the closure, it will be passed.

Read-Only transactions

In order to start a read-only transaction, you can use the DB.View() function:

err := db.View(func(tx *bolt.Tx) error {    ...    return nil})

You can also get a consistent view of the database in this closure, but mutation operations are not allowed in read-only transactions. You can only retrieve the store, retrieve the value, or copy the database in a read-only transaction.

Bulk Read and write transactions

Each DB.Update() waiting disk commits a write. By combining multiple updates with a DB.Batch() function, you can minimize this overhead:

err := db.Batch(func(tx *bolt.Tx) error {    ...    return nil})

Concurrent bulk calls can be combined into larger transactions. Batching is only useful if there are multiple goroutine calls.

If a partial transaction fails, batch can invoke the given function multiple times. The function must be idempotent and will only take effect if it is DB.Batch() returned successfully.

For example, instead of displaying a message inside a function, set the variable within the enclosing scope:

var id uint64err := db.Batch(func(tx *bolt.Tx) error {    // Find last key in bucket, decode as bigendian uint64, increment    // by one, encode back to []byte, and add new key.    ...    id = newValue    return nil})if err != nil {    return ...}fmt.Println("Allocated ID %d", id)

Manage transactions manually

DB.View()and DB.Update() functions are DB.Begin() wrappers for functions. These help functions start a transaction, execute a function, and then safely close the transaction when an error is returned. This is the recommended way to trade with bolts.

However, sometimes you may need to start and end trades manually. You can use DB.Begin() the function directly, but be sure to close the transaction.

// Start a writable transaction.tx, err := db.Begin(true)if err != nil {    return err}defer tx.Rollback()// Use the transaction..._, err := tx.CreateBucket([]byte("MyBucket"))if err != nil {    return err}// Commit the transaction and check for error.if err := tx.Commit(); err != nil {    return err}

DB.Begin()The first parameter is a Boolean value that indicates whether the transaction is writable.

Using buckets

A store is a collection of key/value pairs in a database. All keys in the bucket must be unique. You can use DB.CreateBucket() functions to create a storage bucket:

db.Update(func(tx *bolt.Tx) error {    b, err := tx.CreateBucket([]byte("MyBucket"))    if err != nil {        return fmt.Errorf("create bucket: %s", err)    }    return nil})

Tx.CreateBucketIfNotExists()You can create a bucket only if you use a function that does not exist. Calling this function for all top-level buckets after you open the database is a common pattern, so you can guarantee that they exist for future transactions.

To delete a bucket, simply call the Tx.DeleteBucket() function.

Use Key/value to

To save Key/value to the bucket, use the Bucket.Put() function:

db.Update(func(tx *bolt.Tx) error {    b := tx.Bucket([]byte("MyBucket"))    err := b.Put([]byte("answer"), []byte("42"))    return err})

This will set the value of "answer" key to "42" in the bucket of MyBucket. To retrieve this value, we can use the Bucket.Get() function:

db.View(func(tx *bolt.Tx) error {    b := tx.Bucket([]byte("MyBucket"))    v := b.Get([]byte("answer"))    fmt.Printf("The answer is: %s\n", v)    return nil})

Get()The function does not return an error because its operation is guaranteed to work correctly (unless there is a system failure). If key exists, it returns its byte fragment value. If it does not exist, zero is returned. Note that you can set the 0 length value to a key that differs from the nonexistent key.

Use the Bucket.Delete() function to remove a key from the bucket.

Note that the Get() value returned from is only valid when the transaction is open. If you need to use a value outside the transaction, you must use copy() it to copy it to another byte fragment.

Automatically increase the number of buckets

By using the nextsequence () function, you can have the Bolt determine a sequence that can be used as a key/value pair of unique identifiers. Look at the following example.

 //CreateUser saves u to the store. The new user ID is set on U once the data is Persisted.func (S *store) CreateUser (U *user) error {return s.db.update (f UNC (TX *bolt.        TX) Error {//Retrieve the users bucket.        This should was created when the DB is first opened. B: = Tx.        Bucket ([]byte ("Users"))//Generate ID for the user.        This returns a error only if the Tx are closed or not writeable.        That can ' t happen in a Update () call so I ignore the error check.        ID, _: = B.nextsequence () u.id = Int (ID)//Marshal user data into bytes. BUF, err: = json.        Marshal (U) if err! = Nil {return err}//Persist bytes to users bucket.    Return B.put (Itob (u.id), BUF)})}//Itob returns an 8-byte big endian representation of V.func Itob (v int) []byte { B: = make ([]byte, 8) binary. Bigendian.putuint64 (b, UInt64 (v)) return B}type User struct {ID int ...}  

Iterate keys

Bolt stores keys in a bucket in the order in which they are sorted in bytes. This makes the order iterations of these keys very fast. To traverse key, we will use a Cursor:

db.View(func(tx *bolt.Tx) error {    // Assume bucket exists and has keys    b := tx.Bucket([]byte("MyBucket"))    c := b.Cursor()    for k, v := c.First(); k != nil; k, v = c.Next() {        fmt.Printf("key=%s, value=%s\n", k, v)    }    return nil})

The cursor allows you to move to a specific point in the list of keys and move one key forward or backward at a time.

The cursor has the following features:

First()  Move to the first key.Last()   Move to the last key.Seek()   Move to a specific key.Next()   Move to the next key.Prev()   Move to the previous key.

Each function has a return signature (key [] byte,value [] byte). When you iterate to the end of the cursor, Next() a 0 key is returned. Before calling Next() or Prev() before, you must use First() , Last() or Seek() come to find a location. If you are not looking for a location, then these functions will return a 0 key.

During an iteration, if key is nonzero, but value is zero, it means that key refers to a bucket instead of a value. Use the Bucket.Bucket() access sub bucket.

Prefix scan

Iterative keyword prefixes, which can be Seek() combined and bytes.HasPrefix() grouped together:

db.View(func(tx *bolt.Tx) error {    // Assume bucket exists and has keys    c := tx.Bucket([]byte("MyBucket")).Cursor()    prefix := []byte("1234")    for k, v := c.Seek(prefix); k != nil && bytes.HasPrefix(k, prefix); k, v = c.Next() {        fmt.Printf("key=%s, value=%s\n", k, v)    }    return nil})

Range Scan

Another common use case is to scan a range, such as a time range. If you use sortable time encodings, such as RFC3339, you can query a specific date range as follows:

db.View(func(tx *bolt.Tx) error {    // Assume our events bucket exists and has RFC3339 encoded time keys.    c := tx.Bucket([]byte("Events")).Cursor()    // Our time range spans the 90's decade.    min := []byte("1990-01-01T00:00:00Z")    max := []byte("2000-01-01T00:00:00Z")    // Iterate over the 90's.    for k, v := c.Seek(min); k != nil && bytes.Compare(k, max) <= 0; k, v = c.Next() {        fmt.Printf("%s: %s\n", k, v)    }    return nil})

Note that although RFC3339 is sortable, the Golang implementation of Rfc3339nano does not use a fixed number of digits after the decimal point and therefore cannot be sorted.

ForEach ()

You can also use the ForEach() function if you know you will iterate over all keys in the bucket:

db.View(func(tx *bolt.Tx) error {    // Assume bucket exists and has keys    b := tx.Bucket([]byte("MyBucket"))    b.ForEach(func(k, v []byte) error {        fmt.Printf("key=%s, value=%s\n", k, v)        return nil    })    return nil})

Note that ForEach() the keys and values in are only valid when the transaction is open. If you need to use a key or value outside the transaction, you must use copy() it to copy it to another byte slice.

Nested buckets

You can also store a bucket in a key to create a nested bucket. The API DB is the same as the store management API on objects:

func (*Bucket) CreateBucket(key []byte) (*Bucket, error)func (*Bucket) CreateBucketIfNotExists(key []byte) (*Bucket, error)func (*Bucket) DeleteBucket(key []byte) error

Suppose you have a multi-tenant application where the root-level bucket is an account bucket. Inside this bucket is a series of accounts, which are themselves buckets. In the bucket of the sequence, there can be many stores (users, notes, etc.) that are related to the account itself, separating the information into logical groupings.

CreateUser creates a new user in the given account.func createUser (AccountID int, u *user) error {//Start the Tran    Saction. TX, Err: = db. Begin (TRUE) if err! = Nil {return err} defer TX.    Rollback ()//Retrieve the root bucket for the account.    Assume this have already been created when the account is set up. Root: = TX. Bucket ([]byte (StrConv.    Formatuint (AccountID))//Setup the users bucket. BKT, err: = root.    Createbucketifnotexists ([]byte ("USERS")) if err! = Nil {return err}//Generate an ID for the new user. UserID, err: = bkt.    NextSequence () if err! = Nil {return err} u.id = UserID//Marshal and save the encoded user. If buf, err: = json. Marshal (U); Err! = Nil {return err} else if err: = bkt. Put ([]byte (StrConv. Formatuint (U.id, Ten)), BUF);    Err! = Nil {return err}//Commit the transaction. If err: = Tx.commit (); Err! = Nil {return err} return nil}

Database backup

Blot is a single file, so it's easy to back up. You can use a Tx.WriteTo() function to write a consistent view of a database to a destination. If you call it from a read-only transaction, it performs a hot backup without blocking read and write operations from other databases.

By default, it uses a regular file handle to take advantage of the operating system's page cache. For information on optimizing larger than Ram datasets, refer to the TX documentation.

A common use case is to back up via HTTP, so you can use a tool like curl to make a database backup:

func BackupHandleFunc(w http.ResponseWriter, req *http.Request) {    err := db.View(func(tx *bolt.Tx) error {        w.Header().Set("Content-Type", "application/octet-stream")        w.Header().Set("Content-Disposition", `attachment; filename="my.db"`)        w.Header().Set("Content-Length", strconv.Itoa(int(tx.Size())))        _, err := tx.WriteTo(w)        return err    })    if err != nil {        http.Error(w, err.Error(), http.StatusInternalServerError)    }}

Then you can use this command to back up:

$ curl http://localhost/backup > my.db

Or you can open your browser to http://localhost/backup it and it will download automatically.
If you want to back up to another file, you can use the Tx.CopyFile() helper function.

Compare with other databases

Postgres, MySQL, & other relational databases

Relational databases are structured as rows and can only be accessed by using SQL. This approach provides the flexibility to store and query data, but it also leads to the cost of parsing and planning SQL statements. Bolt accesses all data through the byte slice key. This allows Bolt to read and write data quickly, but does not provide support for built-in connection values.

Most relational databases (except SQLite) are server-independent servers. This gives your system the flexibility to connect multiple application servers to a single database server, but also increases the overhead of serializing and transmitting data over the network. Bolt runs as a library that is contained in an application, so all data access must pass through the application's process. This makes the data closer to your application, but restricts multi-process access to the data.

LevelDB, Rocksdb

LevelDB and its derivatives (ROCKSDB,HYPERLEVELDB) are similar to bolts, they are bound to applications, but their underlying structure is the log structure merge tree (lsm tree). The LSM tree optimizes random writes by using write-ahead logs and multi-level sorting files called Sstables. Bolt uses the B + tree internally, with only one file. There are two ways to compromise.

Leveldb can be a good choice if you need high random write throughput (> w/sec) or if you need to use a spinning disk. If your application is stressed, or if you do a lot of range scanning, Bolt may be a good choice.

Another important consideration is that LevelDB does not have a deal. It supports bulk write key/value pairs, which supports reading snapshots, but does not allow you to safely compare and exchange operations. Bolt supports fully serializable ACID transactions.

LMDB

Bolt was originally a similar implementation of LMDB, so it was structurally similar. Both use B + trees, have the ACID semantics of fully serializable transactions, and use a single writer and multiple reader to support lock-free MVCC.

There are some differences between the two projects. LMDB focuses primarily on raw performance, while Bolt focuses on simplicity and ease of use. For example, LMDB allows you to perform some unsafe operations, such as direct write operations. Bolt chooses to disallow operations that could leave the database in a corrupt state. The only exception to this Bolt is DB. NoSync.

There are some differences in API. LMDB requires the maximum mmap size when mdb_env is turned on, and Bolt automatically handles incremental mmap sizing. LMDB overloads the getter and setter functions with multiple flags, and Bolt breaks down these special cases into their own functions.

Precautions and limitations

Choosing the right tool is very important, and Bolt is no exception. The following points need to be noted when evaluating and using bolts:

  • Bolts are suitable for read-intensive workloads. Sequential write performance is also fast, but random writes can be slow. You can use DB.Batch() or add a pre-write log to help mitigate this issue.
  • Bolts Use the B + tree internally, so there can be many random page accesses. SSDs can significantly improve performance compared to rotating disks.
  • Try to avoid long-running read transactions. Bolt uses copy-on-write technology, old transactions are in use, old pages cannot be recycled.
  • The byte slices returned from the Bolt are valid only during the trading period. Once transactions are committed or rolled back, the memory they point to can be reused by new pages, or they can be unmapped from virtual memory, and an unexpected crash address panic will be seen on access.
  • Bolt uses exclusive write locks on database files, so it cannot be shared by multiple processes
  • Bucket.FillPercentbe careful when you use it. Setting a high fill percentage with a randomly inserted bucket can result in poor page utilization for the database.
  • Larger buckets are generally used. Smaller buckets result in poor page utilization, once they are larger than the page size (typically 4KB).
  • Loading large batches of random writes into a new store can be slow because the page does not split before the transaction commits. It is not recommended to randomly insert more than 100,000 key/value pairs into a single new bucket in a single transaction
    In
  • Bolt uses a memory-mapped file so that the underlying operating system processes the cache of the data. Typically, the operating system caches as many files as possible and frees memory to other processes when needed. This means that bolts can display very high memory usage when working with large databases. However, this is expected, and the operating system will free up memory as needed. Bolts can handle a much larger database than the available physical RAM, as long as its memory mapping is appropriate for the process virtual address space. This may be problematic on 32-bit systems.
  • The data structures in the bolt database are memory-mapped, so the files will be endian specific. This means that you cannot copy the bolt file from a small end machine to a big-endian machine and make it work. For most users, this is not a problem, because most modern CPUs are small-ended.
  • Because of how the page is laid out on disk, Bolt cannot truncate the data file and return the free page to disk. Instead, Bolt retains an idle list of unused pages in its data file. These free pages can be reused for future transactions. Because the database usually grows, this is a good way for many use cases, however, it is important to note that deleting large chunks of data does not allow you to reclaim space on the disk.

This article is translated by the Copernicus team Cong , reproduced without authorization.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.