Recently, Oracle provided the release of NoSQL databases released not long ago. Currently, only the Enterprise Edition and the open-source community edition are available. That is to say, the source code is not available yet. However, you can also get a general idea about the NoSQL database based on the document. I quickly looked at it and summarized it as follows.
I. Data Model
The key contains one or more major key component and zero to multiple minor key component, which are combined with a unique standard record. The key component is a Java String sorted by the corresponding encoding. Value indicates a byte stream. The size of key and value is not strictly limited.
The record also contains the version number. Each update generates a unique new version number. You can specify the version number for the put/delete/get operations. The get operation is used to specify the version to be read, the put/delete version number indicates that the update is performed only when the latest or specified version of the record is used to implement atomic Compare-and-Swap semantics. The version number must be globally unique within at least one partition.
Ii. Partitioning and architecture
Two-tier architecture, the client directly to the storage node. The core architecture is Replication Node and Replication Group. A Replication Group contains one writable Master Replication Node and multiple read-only replica. If the master fails, it will failover to a replica. Currently, you cannot dynamically adjust the number of storage nodes in the released version.
Data is partitioned by major key hash to partition. In this way, only multiple records with the same major key and different minor keys must be in the same partition, which can provide efficient multi-record operations, the system also supports atomic operations such as multiple records. A Replication Group is generally responsible for multiple partitions. A storage Node is generally responsible for one Replication Node. If you adjust the number of storage nodes, data is moved in units of partitions. In order to facilitate scale-out in the future, there should be more partitions at the beginning.
The underlying system uses Berkeley DB Java Edition and Btree data structure. The cache contains the cache of Berkeley DB and the file system cache, without the need for DIRECT_IO. We recommend that the cache of Berkeley DB be used to accommodate the internal nodes of the Btree, and the leaf nodes are cached by the file system. In addition, the single-host version is also known as KVLite.
Iii. Operations
Oracle NoSQL provides a wide range of operations, including:
1. A complete Key must be specified for the put operations used to insert or update records, including put/putIfAbsent/putIfPresent/putIfVersion. The purpose is not mentioned as the name suggests. A little bit is that the putIfVersion function provides Compare-and-Swap, which is useful in processing concurrency.
2. delete operations for deleting records, including delete/deleteIfVersion/multiDelete. The first two must specify the complete Key. The purpose is as the name suggests. To describe multiDelete, a maximum of three parameters can be specified. One is to specify the complete major key, and the other is to specify a KeyRange consisting of the upper and lower limits of the first minor key, third, you can specify multiple Depth modes, such as deleting child nodes, child nodes, parent nodes, and child nodes.
3. get operations for reading records, including get/multiGet/multiGetIterator/storeIterator. Like multiGet and multiDelete, KeyRange and Depth can be specified. MultiGetIterator is used to obtain a large number of records under a complete major key in batches to avoid occupying too much memory. You can specify the traversal direction, without ensuring that the data is a consistent view at a certain time point. StoreIterator is used to traverse a large number of records under incomplete major keys, and even traverse all records.
4. execute for batch atomic update of multiple records. The system guarantees the atomicity of these operations. The restriction is that all operation records must have the same major key and the same record cannot be operated multiple times.
Iv. Data Consistency
Oracle NoSQL provides flexible and fine-grained data consistency. For reading, you can specify that you can only read data from the master, whether or not replica is backward or not. You can only read data from replica when the master time behind replica is within a certain threshold. The version number of replica is not less A specified version number is read. The read consistency of the specified version number can be used to achieve consistency in the form of read-your-own-write, that is, to ensure that you can read the data you just wrote.
For updates, you can specify two policies. First, the master node does not have to wait for the reply of each replica. Here, you can choose to have all the replica answers, most replica answers, and different replica answers. Second, whether the data is persistent to the disk. Here, you can choose not (update to the memory), write to the disk but not SYNC, write to the disk, and SYNC. The persistence policy can be set to master and replica respectively. According to the document, it seems that 2 pcs are not used.
V. system management and others
The system provides command line or WEB interface management tools for convenient management. You can create a snapshot. the snapshot is consistent only within the partition and does not guarantee global consistency. It can be recovered from snapshot. Allows you to import NoSQL Database data to Hadoop. The client driver is a jar package.
Vi. Summary and evaluation
Advantages:
1. Powerful data models and operations. Oracle NoSQL is not a flat-level key-Value model, but is often presented as a tree model. It consists of multiple key components and is supported during design operations. A record with one more key component suffix can be considered as a subnode. The system provides many features to operate a subtree in batches. The comparison Relationship Mode shows that this solves some JOIN Problems and improves the development efficiency.
2. Flexible and fine-grained data consistency. Both reading and updating provide many consistency options to achieve different performance and consistency compromise. In addition, the version number supports Compare-and-Swap, Read-your-own-write, and other semantics, facilitating concurrent correctness.
3. supports atomic multi-record operations.
The main problem in the current version is that storage nodes cannot be added, but I believe this problem will be solved soon.
Source: http://wangyuanzju.blog.163.com/blog/static/130292011919114541710/