Software architects who have worked for a number of large companies have taken into account the Kristóf Kovács's NoSQL database of mainstream blogs (Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j and HBase) made a comprehensive comparison.
Although the SQL database is a very useful tool, the monopoly is about to be broken after a 15-year solo show. It was only a matter of time: forced to use relational databases, but found that the inability to adapt to demand was too numerous.
But the differences between the NoSQL databases are far greater than the differences between the two SQL databases. This means that the Software architect should choose a suitable NoSQL database at the beginning of the project. In this context, comparisons are made for Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, neo4j, and HBase:
Note: NoSQL is a new revolutionary database movement, and advocates of NoSQL advocate the use of relational data storage. Today's computer architectures require a huge level of scalability in data storage, and NoSQL is committed to changing the status quo. At present, Google's BigTable and Amazon's Dynamo are using NoSQL-type databases.
1. CouchDB
Languages used: Erlang
Features: DB consistency, easy to use
Use license: Apache
Agreement: Http/rest
Bidirectional data replication,
Ongoing or temporary processing,
Process with conflict checking,
Therefore, the use of Master-master replication (see note 2)
mvcc– write operation does not block read operations
The version before the file can be saved
Crash-only (reliable) design
Data compression needs to be done periodically
Views: embedded Mappings/reduction
Formatted view: List display
Support for server-side document validation
Support Certification
Live updates based on changes
Support for attachment processing
Therefore, Couchapps (standalone JS application)
Need for jquery program library
Best Practice Scenario: Application for less data changes, execution of predefined queries, and data statistics. Applies to applications that need to provide support for a data version.
For example: CRM, CMS system. Master-master replication is useful for multi-site deployments.
Note: Master-master replication: A database synchronization method that allows data to be shared between groups of computers, and can be updated in a group by any member of the group.
2.Redis
Language used: C + +
Feature: Run abnormally fast
License to use: BSD
Protocol: Class Telnet
Memory database with hard disk storage support,
However, the data can be exchanged to the hard disk since version 2.0 (note that the feature is not supported after 2.4). )
Master-slave copy (see note 3)
Although simple data or hash tables indexed by key values are used, complex operations, such as Zrevrangebyscore, are also supported.
INCR & Co (suitable for calculating limit or statistic data)
Support Sets (also supports Union/diff/inter)
Support List (also supports queues; blocking pop operations)
Support for hash tables (objects with multiple domains)
Support sort sets (high score table, suitable for range query)
Redis Support Services
Supports setting data to expiration data (similar to fast buffer design)
Pub/sub allows users to implement message mechanisms
Best Practice Scenario: applications that are suitable for fast data changes and that database size can be met (for memory capacity).
For example: stock price, data analysis, real-time data collection, real-time communication.
Note: Master-slave replication: If only one server at a time handles all replication requests, this is called Master-slave replication, and is typically applied to server clusters that require high availability.
3. MongoDB
Languages used: C + +
Features: Preserved SQL some friendly features (query, index).
License to use: AGPL (initiator: Apache)
Protocol: Custom, Binary (Bson)
Master/slave Replication (supports automatic error recovery, using sets replication)
Built-in partitioning mechanism
Support for JavaScript expression queries
Arbitrary JavaScript functions can be executed on the server side
Update-in-place support is better than couchdb
Memory to file mapping in data storage
Performance concerns over functional requirements
It is recommended that you turn on the log function (parameter –journal)
Database size is limited to about 2.5Gb on 32-bit operating systems
An empty database accounts for about 192MB
Use Gridfs to store large data or meta data (not real file systems)
Best Practice Scenario: for dynamic query support, indexing rather than map/reduce features, performance requirements for large databases, and applications that use COUCHDB but are full of memory because data changes too frequently.
For example , you would have intended to use MySQL or PostgreSQL, but because of their own predefined bars you are discouraged.
4. Riak
Languages used: Erlang and C, and some JavaScript
Features: Ability to fault tolerance
Use license: Apache
Protocol: http/rest or custom binary
Adjustable distribution and Replication (N, R, W)
Use JavaScript or Erlang for authentication and security support before or after the operation.
Using JavaScript or Erlang for Map/reduce
Connection and connection traversal: can be used as a graphics database
Index: Enter meta data for search (1.0 version will be supported)
Large data Object Support (Luwak)
Provides two versions of "Open source" and "Enterprise"
Full text search, index, query via Riak Search server (Beta version)
Supports SNMP monitoring for masterless multi-site replication and commercial licensing
Best practice scenario: For situations where you want to use a database similar to Cassandra (similar to dynamo) but cannot handle bloat and complexity. Applies to situations where you intend to replicate multiple sites, but require scalability, availability, and error handling for a single site.
For example: Sales data collection, factory control system, there are strict requirements for downtime, can be used as an easy to update Web server.
5. Membase
Languages used: Erlang and C
Features: Compatible with Memcache, but both persistent and support cluster
Usage License: Apache 2.0
Protocol: Distributed caching and extension
Very fast (200k+/seconds), indexed data via key values
Persistent storage to hard disk
All nodes are unique (master-master replication)
Cache cells that support similar distributed caching in memory
Reduces IO by removing duplicate data when writing data
Provides a very good cluster Management web interface
Soft need to stop database service when updating software
Connection agents that support connection pooling and multiplexing
Best Practice Scenario: for applications that require low latency data access, high concurrency support, and high availability
For example: Low latency data access such as ad-targeted applications, high concurrent Web applications such as online games (Zynga, for example)
6. neo4j
Languages used: Java
Feature: Graph database based on relation
License to use: GPL, some of which features use agpl/business license
Protocol: Http/rest (or embedded in Java)
can be used independently or embedded in a Java application
Both the node and the edge of the graph can have metadata
Very good self-with web management function
Use multiple algorithms to support path search
Using key values and relationships for indexing
Optimize for read operations
Support transactions (with Java API)
Using Gremlin graphics to traverse languages
Support for groovy Scripts
Support for online backup, advanced monitoring and high reliability support using agpl/business license
Best Application Scenario: suitable for graphic type data. This is the most significant difference between neo4j and other NoSQL databases.
For example: social relations, public transport networks, maps and network extension spectra
7. Cassandra
Languages used: Java
Features: Best support for large tables and dynamo
Use license: Apache
Agreement: Custom, Binary (economical)
Adjustable distribution and Replication (N, R, W)
Supports column queries with a range of key values
Functions similar to a large table: columns, a collection of columns for an attribute
Write operations are faster than read operations
Based on the Apache distributed platform as much as possible map/reduce
I admit that there is a bias against Cassandra, partly because of its own bloated and complex nature, and also because of Java problems (configuration, anomalies, etc.)
Best scenario: When using write-over-read operations (logging) If each system build must be written in Java (no one is fired for choosing Apache software)
For example: Banking, Finance (although not necessary for financial transactions, but these industries require more databases than they do) write faster than read, so a natural feature is real-time data analysis
8. HBase
(Used with Ghshephard)
Languages used: Java
Features: Supports billions of rows x millions of columns
Use license: Apache
Agreement: Http/rest (Support Thrift, see note 4)
Modeling after BigTable
Using Distributed architecture Map/reduce
Optimization of real-time queries
High Performance Thrift Gateway
Pre-contract query operations by scanning and filtering on the server side
Support for XML, PROTOBUF, and binary http
cascading, Hive, and pig source and sink modules
Shell based on Jruby (JIRB)
Both configuration changes and minor upgrades are rolled back
There is no single point of failure
Random access performance comparable to MySQL
Best Practice Scenario: Suitable for preference bigtable: and requires random, real-time access to large data.
For example: The Facebook message database (more common use cases are coming up)
Note: Thrift is an interface definition language that provides definition and creation services for a variety of other languages, developed by Facebook and open source.
Of course, all systems have more than just these features listed above. Here I just list some of the important features I think are in my own opinion. At the same time, technological progress is fast, so the above content must be constantly updated. I will update this list as best I can.