NewSQL database VoltDB features, newsql database voltdb
VoltDB is a revolutionary new database product called NewSQL database. It is based on H-Store and claims to be 45 times higher than the current throughput of database products, and has high scalability. It has the following features:
High throughput and low latency: implemented through memory computing, stored procedures, and serial data access.
Ø Scalability: automatic partitioning and replication ensure performance and scalability.
High Availability: synchronous multi-master replication (K-safety in VoltDB ).
Persistence: an innovative technical combination of database snapshots and command logs.
1. High throughput and low latency
VoltDB provides high-throughput and Low-latency SQL operations. In general, it uses memory computing to avoid disk blocking (disk stall) and user blocking (user stall) through the storage process ), data access in a cluster node is serialized to avoid the overhead of traditional database locks and buffer management. In addition, VoltDB is not developed in pure Java, and its SQL Execution engine is written in C ++, so it is not affected by GC suspension.
ØMemory computing: The VoltDB does not need to wait for disk loading during transaction execution to avoid disk I/O overhead. Fully utilizes the huge memory on modern servers to Maximize throughput.
ØStored Procedure: Avoid multiple communication overhead between applications and databases. each transaction is defined as a stored procedure. Therefore, only one round trip is required for transactions. However, VoltDB does not only support stored procedures. From Version 1.1, it has been able to support JDBC, SQL command line, HTTP/JSON, native C ++/PHP/C #/Node. js and other client SQL queries. The only restriction is:VoltDB is always in the automatic commit mode and does not support manual transaction control..
ØSerializing Data Access: In the first two cases of blocking and waiting, traditional databases will switch to other transactions, resulting in a large lock and locking overhead. A VoltDB database is composed of many memory computing engines (called partition partitions). Each partition is a collection of data and related processing processes. VoltDB automatically distributes data in the cluster to create partitions, each of which is a single thread, thus avoiding the overhead of traditional databases for concurrency control.
ØC ++ execution engine: VoltDB uses native C ++ code to allocate memory for table data and execute SQL statements. The reason why the core does not use Java is to avoid placing long-lived table data on the JVM stack, at the same time, the memory usage is more fine-grained. In addition, although schema-related data such as static deployment is managed in Java, DirectByteBuffer is also used to allocate data to off-heap memory. In fact, the JVM heap is only used to allocate transaction-related data with short storage periods, which is a suitable load for GC.
If a transaction only involves data in a single partition, the processing process is shown in:
2. Scalable Architecture
From the perspective of architecture, VoltDB belongs to the shared nothing architecture, so it is easy to implement expansion. vertical expansion can be achieved by increasing the capacity and performance of existing nodes, by dynamically adding new nodes to implement horizontal scaling, you do not need to modify any database schema and application code in this process.
At the same time, VoltDB not only supports Table Partitioning, but also supports table replication. For large tables, you can use partitions to improve performance. For small tables that are frequently read, you can use replication to reduce join.
This is similar to the concept of mirrored region and partitioned region in the distributed cache GemFire. In GemFire, mirrored region contains full data, while partitioned region only contains partition data. However, the difference is that VoltDB selects replication or Partitioning Based on the characteristics of the table, while GemFire crawls data from other partitions through mirrored region to form a full data image.
If a transaction involves data access from multiple partitions, the process is shown in. A node acts as a coordinator to distribute tasks to other nodes, collect results, and complete the tasks.
3 High Availability
Unlike traditional RDBMS products that rely on third-party HA solutions, VoltDB provides three HA capabilities: K-safety, network fault detection, and rejoin of surviving nodes ).
3.1 K-safety
When K-safety is configured, VoltDB automatically copies database partitions. K indicates the number of copies. For example, if K is set to 0, no replica exists. Therefore, any node failure will cause the whole database cluster to stop providing services. When K = 1, it indicates that one copy exists, that is, two copies exist. Note that the replicas in VoltDB can be read and written, rather than the traditional master-slave replication relationship.
Regarding how to solve the data synchronization problem,Any operation on the replication partition will be sent to each copy node for execution.To ensure consistency. If one of the nodes fails, the database will continue to send this operation to the failed node. Therefore, VoltDB is very different from traditional databases at this point, and there is no data synchronization conflict problem in the case of multi-master. Therefore, K-safety is also called synchronous multi-master replication.
3.2 Network Fault Detection
When a network failure occurs, the VoltDB nodes are physically isolated from each other and the other node is deemed to have failed. Then, the K-safety mechanism will make the nodes on both sides continue to provide services separately. If it is not detected in time, this split brain will cause serious data synchronization problems. Therefore, VoltDB automatically detects network faults, and immediately evaluates that node should continue to serve, and takes snapshots of the node data on the other side before stopping the service. When a network fault is solved, you can directly use the live node reconnection technology described below to re-Add the node to the cluster.
3.3 live node Reconnection
Offline VoltDB nodes can be rejoined to the cluster through the rejoin operation. The specific process is: first obtain a copy of data from the sibling node. When catching up with the sibling node, the surviving node can return to the normal state and accept the task.
4 persistence
Although the HA of VoltDB can reduce the probability of downtime, faults still occur occasionally, and DBAs sometimes have to stop maintenance regularly. Therefore, VoltDB provides high-performance snapshots and command logs to support various persistence requirements. For logs, VoltDB supports synchronization and Asynchronization, and the interval between refreshing and disk.
What is the difference between command log and the traditional WAL (write-ahead log? (To be studied in depth)
Summary
However, this does not mean that VoltDB is omnipotent. Its design and features determine its application scenarios. VoltDB is suitable for applications with high-frequency requests and short transactions, such as finance, retail, and Web2.0, and streaming data applications, such as recommendation engine, real-time advertising platform, click stream processing, and fraud transaction detection.
References
1 VoltDB Technical Overview
2 Using VoltDB
3 Debunking Myths about the VoltDB in-memory database
4 Impact of Java Garbage Collection on in-memory databases
5 Command logging vs. Write-ahead Logging