Using Oracle Berkeley DB as a NoSQL data store
Shashank Tiwari
Released February 2011
"NoSQL" is a new buzzword among developers, architects, and even technology managers. Although the term has recently been popular, it is surprising that it does not have a universally accepted definition.
Typically, any database that is non-RDBMS and follows a modeless structure generally does not fully support ACID transactions, and is generally categorized as a "NoSQL data store" due to high availability commitments and support for large datasets in a scaled-out environment. Given these common features (in stark contrast to the characteristics of traditional RDBMS), it has been suggested that non-relationship (or simply Nonrel) is a more appropriate term than NoSQL.
Although definition conflicts persist, many people are aware of the benefits of adding NoSQL to their application architecture. Others are keeping a close watch and evaluating whether NoSQL is right for them.
The development of NoSQL as a category has also led to the emergence of a large number of new data stores. Some of these new NoSQL products are good at persisting documents such as JSON, some sorted by column family store, and others persist with distributed key-value pairs. While newer products are exciting and offer a lot of useful features, some of the existing products are catching up to meet new commitments.
Oracle Berkeley DB is such a data store. In the text, I'll explain why Berkeley DB can be included in the system as a NoSQL solution and how it is implemented. This article focuses on the features of Berkeley db around NoSQL and therefore does not cover all the features and features of Berkeley db in detail.
Berkeley DB Features
Basically, there are three different styles of key-value storage Berkeley DB:
-
Berkeley db-stored in key values written in C. (The Berkeley DB Official document uses the term key-data instead of the key value.) This is the "classic" style.
-
Berkeley DB Java Edition (JE)-stored with key values rewritten in Java. Can be easily included in the Java stack.
-
Berkeley DB xml-is written in C, which wraps the key-value store so that it behaves like an indexed and optimized XML storage System.
(Note: Although this article does not explicitly cover Berkeley db JE or Berkeley db XML, it includes some examples of using the Java API and the Java-based persistence framework to illustrate Berkeley DB functionality.) )
The core of Berkeley DB can be simple enough to be configured to provide parallel, non-blocking access or support transactions, scale out to a highly available cluster of a master-slave replica, or scale horizontally in a number of other ways.
Berkeley DB is a pure storage engine that does not assume any implicit pattern or structure for key-value pairs. As a result, Berkeley DB easily allows for higher level API, query, and modeling abstractions on the underlying key-value store. This helps to store application-specific data quickly and efficiently without the overhead of converting it to an abstract data format. This simple yet sophisticated design provides the flexibility to store both structured and semi-structured data in Berkeley DB.
Berkeley DB can be run as in-memory storage to hold a small amount of data, or it can be configured as a large data store through a fast in-memory cache. With the help of a higher level abstraction (called an environment ), you can configure multiple databases in a physical installation. An environment can have multiple databases. You need to open an environment and open a database, write data to it, or read data from it. It is recommended that you close the database and environment after the interaction is complete to use resources optimally.
Each item in the database is a key-value pair. The key is usually unique, but you can have duplicate items. The value is accessed by a key. You can update the retrieved values and save them back to the database. Multiple values are accessed and iterated through the cursor. Cursors allow you to iterate through a collection of values and manipulate the entire collection of values at the same time. In addition, transactional and concurrent access is supported.
Keys for key-value pairs almost always serve as the primary key for indexing. Other properties in the value can act as a secondary index. Maintain the secondary index separately in the secondary database. Therefore, the primary database with key-value pairs is sometimes referred to as the primary database.
Berkeley DB runs as a data store in a process, so you can link to it statically or dynamically when you access it from the program using C, C + +, C #, Java, or the scripting language API.
After a brief introduction, the following will describe the features of Berkeley DB around NoSQL.
Flexible mode
The first advantage of NoSQL storage is its relaxed attitude towards well-defined database schemas. Let's take a look at how Berkeley DB implements this feature.
To understand the functionality of Berkeley DB, we recommend that you try it out. Therefore, it is recommended that Berkeley DB and Berkeley db JE be downloaded and installed on your computer so that you can try some examples yourself and follow the rest of the illustrations in this article. Download links and installation instructions are available online here. (In this article, I used--enable-java 、--Enable-sql and--prefix=/usr/local to compile Berkeley DB.) The basic concepts related to storage, access mechanisms, and APIs are not very different between Berkeley db and Berkeley db JE, so much of what I'm referring to later applies to both.
In addition to data items that must be a collection of key-value pairs, Berkeley DB itself has very few restrictions on data items. This gives applications the flexibility to use Berkeley DB to manage data in a variety of formats, including SQL, XML, and Java objects. You can access data in Berkeley DB through the underlying API, SQL API, Java Collections API, and Java Direct persistence Layer (DPL). It allows several different storage configurations: B-Tree, hash, queue, and Recno. (The Berkeley DB document calls different storage mechanisms "access methods.") The hash, queue, and RECNO access methods are available only in Berkeley db, not in Berkeley db JE or Berkeley db XML. )
You can select access mechanisms and storage configurations based on specific use cases. Selecting a specific access method and storage configuration affects the pattern. To understand the impact of your choices, you need to know what you've chosen. Next I'll talk about access methods and storage configurations.
Using the underlying API
The underlying API is a low-level API that allows you to store, retrieve, and update data (that is, key-value pairs). This API is similar between several different language bindings. Therefore, the basic APIs for C, C + + and Java are identical. On the other hand, DPL and the Java Collections API are provided only as abstractions in the Java API.
The underlying API can place, get, and delete key-value pairs. Both the key and the value are byte arrays. Before all keys and data values are stored, they are serialized as byte arrays. You can use the built-in serializer of Java or the BIND API of Berkeley DB to serialize various data types into byte arrays. Java's built-in serializers typically perform slower, so users must prefer the BIND API. (The Jvm-serializers project benchmarks a variety of alternative serializers and is a good reference point for analyzing relative performance between different serialization mechanisms of the JVM.) The BIND API allows each serialization class to avoid redundant storage class information and put that information in a separate database. You can potentially increase the speed by writing your own custom byte-group bindings to improve the bind API performance.
As a basic example, you can define the following data values:
import java.io.Serializable;
public class DataValue implements Serializable {
private long prop1;
private double prop2;
DataValue() {
prop1 = 0;
prop2 = 0.0;
}
public void setProp1(long data) {
prop1 = data;
}
public long getProp1() {
return prop1;
}
public void setProp2(double data) {
prop2 = data;
}
public double getProp2() {
return prop2;
}
}
You can now use two databases to store this data value, one database that stores values with keys, and another that stores class information.
Use four different steps to store data:
-
First, a database other than the one used to store the key-value pairs is configured to store class data, as follows:
Database Aclassdb = new Database ("Classdb", null, adbconfig);
- Then, instantiate a class directory as follows:
Storedclasscatalog Storedclasscatalog = new Storedclasscatalog (ACLASSDB);
- Establish a serial entry binding as follows:
entrybinding binding = new Serialbinding (Storedclasscatalog, Datavalue.class);
- Finally, the DataValue instance is as follows:
DataValue val = new DataValue();
val.setProp1(123456789L);
val.setProp2(1234.56789);
Use the binding you just created to map to Berkeley DB Databaseentry (the wrapper that serves as the key and value) as follows:
DataValue val = new DataValue();
val.setProp1(123456789L);
val.setProp2(1234.56789);
You can now put the key-value pairs in Berkeley DB.
The underlying API supports several variants of the put and get methods to allow or disallow duplicates and overrides. (Neither the example nor this article is intended to teach you the detailed syntax or semantics of how to use the underlying API, so I will not involve more details; see the documentation here). One important point is that the underlying API allows for low-level operations and custom serialization of key-value pairs for storage, retrieval, and deletion.
If you prefer to use a more advanced API to interact with Berkeley DB, you should use DPL.
Using DPL
The direct persistence layer (DPL) provides familiar Java persistence framework semantics to manipulate objects. You can treat Berkeley DB as an entity store where objects persist, and objects within them can be retrieved for updating and deletion. DPL uses annotations to mark a class as @Entity. The associated class that is stored with the entity is commented as @Persistent. Specific attributes or variables can be annotated as @PrimaryKey and @SecondaryKey. A simple entity might resemble the following:
@Entity
public class AnEntity {
@PrimaryKey
private int myPrimaryKey;
@SecondaryKey(relate=ONE_TO_ONE)
private String mySecondaryKey;
...
}
DPL uses the class definition as a well-defined pattern. Through the underlying API, we know that Berkeley DB does not require that the pattern be compliant. For some use cases, however, formal entity definitions are helpful and provide a structured approach to data modeling.
Storage configuration
As mentioned earlier, key-value pairs can be stored in four different types of data structures: B-tree, hash, queue, and Recno. Let's see how it works.
- B-Tree. The requires some brief introduction to the B-tree, but if you need to see its definition, read the Wikipedia page http://en.wikipedia.org/wiki/B-tree about the B-tree. This is a balanced tree-type data structure that ensures that its elements are sorted and allow fast sequential access, insertion, and deletion. The key and value can be any data type. In Berkeley DB, the B-tree access method allows duplicates. This is a good choice if you need to use a complex data type as a key. This is also a good choice if the data access pattern results in access to adjacent records. The B-tree preserves a large amount of metadata that can be executed efficiently. Most Berkeley DB applications use B-tree storage configurations.
- Hash. similar to a B-tree, hashing also allows complex types as keys. The hash has a more linearized structure than the B-tree. The Berkeley DB hash structure allows duplicates.
Although both the B-tree and the hash support complex keys, the performance of the hashing database is usually better than the B-tree when the dataset is much larger than the available memory size. This is because B-trees hold more metadata than hashes, and a larger dataset means that the B-tree metadata may not be stored in the in-memory cache. In this extreme case, the B-tree metadata and the actual data record itself usually have to be taken from the file, which results in multiple i/0 per operation. The hashing access method is designed to minimize the number of I/OS required to access data records, so in these extreme cases performance may be better than the B-tree.
- queue. The queue is a set of fixed-length records that are stored sequentially. The key is limited to the logical record number of the integer type. Records are appended sequentially, allowing extremely fast writes. If you are impressed by the fast writing of Apache Cassandra by appending to the log, try Berkeley DB with the queue access method, and you will not be disappointed. These methods also allow valid reads and updates from the head of the queue. The queue also supports row-level locking. This guarantees effective transactional integrity even in the case of concurrent processing.
- Recno. The Recno is similar to a queue, but allows for variable-length records. Similar to queues, the Reco key is also limited to integers.
Different configurations allow you to store any type of data in the collection. Like NoSQL, there is no fixed mode (except for the pattern that your model implements). In extreme cases, you can store a different value type for two keys in the collection, respectively. Value types can be complex classes, which can represent JSON documents, complex data structures, or structured datasets in terms of parameters. The real only limitation is that the value should be serialized as a byte array. A single key or a single value can be up to 4GB.
The occurrence of the secondary index allows filtering based on the value attribute. The primary database does not store data in tabular format, so non-existing attributes are not stored for sparse datasets. If the key-value pair is missing the attribute used to create the index, the secondary index skips all such key-value pairs. In general, this storage method is compact and efficient.
Support for a transaction
Berkeley db is a very flexible database that can turn on and off many features. Berkeley DB can be run without a transaction support, or it can be compiled to support ACID transaction integrity. Perhaps, the plasticity of Berkeley DB makes it a very suitable data store for many situations. In a typical NoSQL data store, the support for transactional integrity is the worst. In systems with high availability that do not expect ACID transaction compliance, Berkeley DB can turn transactions off and work like a typical NoSQL product. However, in other systems, it may be flexible and support transactional integrity.
Although I do not intend to deal with the details of the transaction, it is worth noting that, like traditional RDBMS systems, Berkeley DB, which supports transactions, allows the definition of transaction boundaries. Once committed, the data is persisted to disk. To improve performance, you can use non-persistent commits, which commit a write to an in-memory log file and then synchronize with the underlying file system. Isolation levels and locking mechanisms are also supported.
The synchronization operation ensures that the persistent file copy has the latest in-memory information in the system before the database shuts down. This synchronization is combined with the transaction recovery subsystem of Berkeley DB (assuming you have enabled transactions) to ensure that the database always returns to a consistent transaction state, even in the event of an application or system failure.
Large data sets
Theoretically, Berkeley DB has an upper limit of 256TB, but in practice it is usually limited by the size of the computer running Berkeley db. As of this writing, Berkeley DB has not proven to support large files spanning multiple computers with the help of a distributed file system. (Distributed file systems, such as Hadoop Distributed File System (HDFS), can help manage files that are larger than a single node size.) The performance of Berkeley DB on the local file system is better than on the network file system. More precisely, Berkeley DB relies on the POSIX-compliant properties of the file system. For example, when Berkeley DB calls Fsync () and the file system returns, Berkeley DB assumes that the data has been written to the persistent media. For performance reasons, distributed file systems generally do not guarantee that writes to persistent media are completed from start to finish.
The maximum supported B-tree depth is 255. The length of the key and value is usually limited by the available memory.
Horizontal Scaling
Berkeley DB replication follows master-slave mode. In such a pattern, there is a master node and multiple slave nodes (or replicas). However, the selection of the master node is not static and is not recommended for manual selection. All participating nodes in the replication cluster undergo an electoral process to elect the master node. The participating node with the most recent logging will be the winner. If there is a binding, then the priority is used to select the master node. The electoral process is based on industry-standard compliant Paxos algorithms.
Replication has many benefits, including:
-
Improved read performance-reading data from multiple replica nodes can greatly improve read performance.
-
Increased reliability-with replica instances, you can provide better failover options in the event of node failure and data corruption.
-
Increased durability-You can relax the persistence of the primary node to avoid excessive write operations to the disk, which typically requires expensive I/O. In a clustered environment, persistence is enhanced by the fact that a write is committed to multiple nodes, even if they are not written to disk.
-
Increased availability-because there are multiple nodes and asynchronous writes to the disk, the replica node can continue to serve even if the primary node is under high load.
Summarize
There is no doubt that Berkeley DB is well-qualified as a robust, scalable NoSQL key-value store; Amazon's Dynamo, Project Voldemort, Memcachedb, and Geniedb use Berkeley DB as the underlying storage Further evidence in support of this view. There have been some fears around Berkeley DB performance, especially the following two online-published baseline tests:
However, many running systems demonstrate the power of Berkeley DB. Many of these systems have undergone careful tuning and application coding improvements, and have achieved excellent scalability, throughput, and reliability results. Emulating these systems, Berkeley DB can undoubtedly be used as a scalable NoSQL solution.
Shashank Tiwari is the founder and CEO of Treasury of Ideas, a technology-driven innovation and value optimization company. As an experienced software developer and architect, he is proficient in a wide range of technologies. He is an internationally recognised speaker, author and mentor. As a member of the expert group of several JCP (Java Community Process) specifications, he has been actively involved in planning the future of Java. He also represents the voice of NoSQL and cloud computing and is recognized by the RIA community as an expert.