Krist ófskács, a software architect and consultant who once worked in a number of major companies, blogs on mainstream NoSQL databases (Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j, and HBase)) A comprehensive comparison was conducted.
Although SQL database is a very useful tool, after 15 years of outstanding performance, the monopoly will be broken. This is only a matter of time: I was forced to use relational databases, but I finally found that I could not meet my needs.
However, the difference between NoSQL databases is far greater than that between SQL databases. This means that software architects should select a suitable NoSQL database at the beginning of the project. In this case, we compared Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j, and HBase:
Note: NoSQL is a revolutionary new database movement. NoSQL advocates the use of non-relational data storage. Today's computer architecture requires massive horizontal scalability in terms of data storage, while NoSQL is committed to changing this situation. Currently, Google BigTable and Amazon Dynamo use NoSQL databases.
1. CouchDB
Language used: Erlang
Features: DB consistency, easy to use
License: Apache
Protocol: HTTP/REST
Bidirectional data replication,
Continuous or temporary handling,
Conflict check during processing,
Therefore, master-master replication is used (see note 2)
MVCC-write operations do not block read Operations
Versions earlier than files can be saved
Crash-only (reliable) Design
Data Compression from time to time
View: Embedded ing/reduction
Format view: list display
Supports server-side document verification
Authentication supported
Real-time update based on changes
Support attachment processing
Therefore, CouchApps (independent js applications)
JQuery library required
Best application scenarios:This method is applicable to applications that execute predefined queries and perform data statistics when there are few data changes. Applicable to applications that require data version support.
For example:CRM and CMS systems. Master-master replication is very useful for multi-site deployment.
Note: master-master replication is a database synchronization method that allows data to be shared among a group of computers and any member in the group to update the data in the group.
2. Redis
Language used: C/C ++
Features: Fast Running exceptions
License: BSD
Protocol: Telnet-like
Memory databases supported by hard disk storage,
However, data can be exchanged to the hard disk after version 2.0 (Note: Versions later than version 2.4 do not support this feature !)
Master-slave replication (see appendix 3)
Although simple data or hash tables indexed by key value are used, complex operations such as ZREVRANGEBYSCORE are also supported.
INCR & co (suitable for calculating limit values or statistical data)
Sets are supported (union, diff, and inter are also supported)
Support List (queue and blocking pop operations are also supported)
Support for hash tables (objects with multiple domains)
Supports sorting sets (high-score tables, applicable to range queries)
Redis supports transactions
Supports setting data to expired data (similar to the fast Buffer Design)
Pub/Sub allows users to Implement Message mechanisms
Best application scenarios:Suitable for applications with fast data changes and database size (suitable for memory capacity.
For example:Stock price, data analysis, real-time data collection, and real-time communication.
Note: Master-slave replication: if only one server processes all replication requests at the same time, this is called Master-slave replication. Generally, the application is in a server cluster that needs to provide high availability.
3. MongoDB
Language used: C ++
Features: it retains some user-friendly features (queries and indexes) of SQL ).
License: AGPL (initiator: Apache)
Protocol: Custom, binary (BSON)
Master/slave replication (Supports automatic error recovery and sets replication)
Built-in sharding Mechanism
Support for javascript expression Query
Attackers can execute arbitrary javascript Functions on the server.
Update-in-place support is better than CouchDB
Use memory-to-file ing for data storage
The focus on performance exceeds the functional requirements
We recommend that you enable the log function (parameter-journal)
On a 32-bit operating system, the database size is limited to approximately 2.5 Gb.
Empty databases account for about 192 Mb
Use GridFS to store big data or metadata (not a real File System)
Best application scenarios:It is applicable to applications that require dynamic query support, which require indexes instead of map/reduce functions, performance requirements for large databases, and applications that require CouchDB but are full of memory due to frequent data changes.
For example:You are planning to use MySQL or PostgreSQL, but they are discouraged by their pre-defined columns.
4. Riak
Languages used: Erlang and C, and some Javascript
Features: Fault Tolerance
License: Apache
Protocol: HTTP/REST or custom binary
Adjustable distribution and replication (N, R, W)
Use JavaScript or Erlang for verification and security support before or after the operation.
Map/reduce using JavaScript or Erlang
Connection and connection traversal: Used as a graph database
Index: Enter metadata for search (available soon in version 1.0)
Big Data Object support (Luwak)
Available in "Open Source" and "Enterprise" versions
Full text search, index, search by Riak server (beta version)
Support for Masterless multi-site replication and commercial license SNMP monitoring
Best Application Scenario: Suitable for scenarios where you want to use databases similar to Cassandra (similar to Dynamo) but cannot handle bloat and complexity. It is applicable to scenarios where you plan to replicate multiple sites, but you need to have requirements on the scalability, availability, and error handling of a single site.
For example, sales data collection, factory control system, strict downtime requirements, can be used as easy to update web servers.
5. Membase
Languages used: Erlang and C
Features: compatible with Memcache, but with both persistence and cluster support
License: Apache 2.0
Protocol: distributed cache and Expansion
Very fast (200 k +/second), data is indexed by key value
Persistent storage to Hard Disk
All nodes are unique (master-master replication)
Cache Units similar to distributed cache are also supported in the memory.
When writing data, I/O is reduced by removing duplicated data.
Provides a very good web interface for cluster management
No need to stop the Database Service when updating the software
Supports connection pool and multiplexing connection proxy
Best application scenarios:Suitable for applications that require low-latency data access, high concurrency support, and high availability
For example:Low-latency data access, such as Ad-targeted applications, and high-concurrency web applications such as online games (such as Zynga)
6. Neo4j
Language used: Java
Features: Relational Graph Databases
License: GPL, some of which use AGPL/commercial license
Protocol: HTTP/REST (or embedded in Java)
Java applications can be used or embedded independently.
Nodes and edges of a graph can contain metadata.
Good built-in web Management Function
Supports path search using multiple algorithms
Index using key values and relationships
Optimize read Operations
Support transactions (using Java APIs)
Use Gremlin graphic traversal Language
Supports Groovy scripts
Supports online backup, advanced monitoring, and high reliability. Supports the use of AGPL/commercial license.
Best application scenarios:Applicable to graphics. This is the most significant difference between Neo4j and other nosql databases.
For example:Social relations, public transportation networks, maps and network extensions
7. Cassandra
Language used: Java
Features: best support for large tables and Dynamo
License: Apache
Protocol: Custom, binary (conservation-oriented)
Adjustable distribution and replication (N, R, W)
Supports column query with key values in a certain range
Similar to the function of a big table: column, a feature column set
Write operations are faster than read Operations
Map/reduce as much as possible based on the Apache Distributed Platform
I admit that I am biased against Cassandra, in part because of its bloated and complex nature and Java problems (configuration, exceptions, and so on)
Best application scenarios:When you use write operations more than read Operations (logging), if every system build must be written in Java (no one is fired because Apache software is used)
For example:Banking and financial industries (although not essential for financial transactions, these industries have more database requirements than they do). Therefore, a natural feature is real-time data analysis.
8. HBase
(Used with ghshephard)
Language used: Java
Features: supports billions of rows X millions of Columns
License: Apache
Protocol: HTTP/REST (support for Thrift, see note 4)
Modeling After BigTable
Distributed architecture Map/reduce
Optimize real-time queries
High-performance Thrift Gateway
Prediction of query operations by scanning and filtering on the server side
Supports XML, Protobuf, and binary HTTP
Cascading, hive, and pig source and sink modules
Jruby-based shell
Rollback is performed for configuration changes and minor upgrades.
No spof
Comparable to the random access performance of MySQL
Best application scenarios:It is applicable to scenarios where BigTable is preferred and big data needs to be accessed randomly and in real time.
For example, Facebook message database (more common use cases are coming soon)
Note: Thrift is an interface definition language that provides definition and creation services for multiple other languages. It is developed and open-source by Facebook.
Of course, all systems not only have these features listed above. Here I only list some important features I think based on my own points of view. At the same time, technological advances are fast, so the above content must be constantly updated. I will do my best to update this list.