KristfKovcs, a software architect and consultant who once worked in a number of major companies, blogs on mainstream NoSQL databases (Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j, and HBase) A comprehensive comparison was conducted. Although SQL database is very useful NoSQL
Krist ófskács, a software architect and consultant who once worked in a number of major companies, blogs on mainstream NoSQL databases (Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j, and HBase)) A comprehensive comparison was conducted.
Although SQL database is a very useful tool, after 15 years of outstanding performance, the monopoly will be broken. This is only a matter of time: I was forced to use relational databases, but I finally found that I could not meet my needs.
However, the difference between NoSQL databases is far greater than that between SQL databases. This means that software architects should select a suitable NoSQL database at the beginning of the project. In this case, we compared Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j, and HBase:
Note: NoSQL is a revolutionary new database movement. NoSQL advocates the use of non-relational data storage. Today's computer architecture requires massive horizontal scalability in terms of data storage, while NoSQL is committed to changing this situation. Currently, Google BigTable and Amazon Dynamo use NoSQL databases.
1. CouchDB
- Language used: Erlang
- Features: DB consistency, easy to use
- License: Apache
- Protocol: HTTP/REST
- Bidirectional data replication,
- Continuous or temporary handling,
- Conflict check during processing,
- Therefore, master-master replication is used (see Note 2)
- MVCC-write operations do not block read operations
- Versions earlier than files can be saved
- Crash-only (reliable) design
- Data compression from time to time
- View: embedded ing/reduction
- Format View: list display
- Supports server-side document verification
- Authentication supported
- Real-time update based on changes
- Support attachment processing
- Therefore, CouchApps (independent js applications)
- JQuery library required
Best Application scenarios:This method is applicable to applications that execute predefined queries and perform data statistics when there are few data changes. Applicable to applications that require data version support.
For example:CRM and CMS systems. Master-master replication is very useful for multi-site deployment.
Note: master-master replication is a database synchronization method that allows data to be shared among a group of computers and any member in the group to update the data in the group.
2. Redis
- Language used: C/C ++
- Features: fast running exceptions
- License: BSD
- Protocol: Telnet-like
- Memory databases supported by hard disk storage,
- However, data can be exchanged to the hard disk after version 2.0 (note: versions later than version 2.4 do not support this feature !)
- Master-slave replication (see Appendix 3)
- Although simple data or hash tables indexed by key value are used, complex operations such as ZREVRANGEBYSCORE are also supported.
- INCR & co (suitable for calculating limit values or statistical data)
- Sets are supported (union, diff, and inter are also supported)
- Support List (queue and blocking pop operations are also supported)
- Support for hash tables (objects with multiple domains)
- Supports sorting sets (high-score tables, applicable to range queries)
- Redis supports transactions
- Supports setting data to expired data (similar to the fast buffer design)
- Pub/Sub allows users to implement message mechanisms
Best Application scenarios:Suitable for applications with fast data changes and database size (suitable for memory capacity.
For example:Stock price, data analysis, real-time data collection, and real-time communication.
Note: Master-slave replication: if only one server processes all replication requests at the same time, this is called Master-slave replication. Generally, the application is in a server cluster that needs to provide high availability.
3. MongoDB
- Language used: C ++
- Features: it retains some user-friendly features (queries and indexes) of SQL ).
- License: AGPL (initiator: Apache)
- Protocol: Custom, binary (BSON)
- Master/slave replication (supports automatic error recovery and sets replication)
- Built-in sharding mechanism
- Support for javascript expression query
- Attackers can execute arbitrary javascript functions on the server.
- Update-in-place support is better than CouchDB
- Use memory-to-file ING for data storage
- The focus on performance exceeds the functional requirements
- We recommend that you enable the log function (parameter-journal)
- On a 32-bit operating system, the database size is limited to approximately 2.5 Gb.
- Empty databases account for about 192 Mb
- Use GridFS to store big data or metadata (not a real file system)
Best Application scenarios:It is applicable to applications that require dynamic query support, which require indexes instead of map/reduce functions, performance requirements for large databases, and applications that require CouchDB but are full of memory due to frequent data changes.
For example:You are planning to use MySQL or PostgreSQL, but they are discouraged by their pre-defined columns.
4. Riak
- Languages used: Erlang and C, and some Javascript
- Features: Fault tolerance
- License: Apache
- Protocol: HTTP/REST or custom binary
- Adjustable distribution and replication (N, R, W)
- Use JavaScript or Erlang for verification and security support before or after the operation.
- Map/reduce using JavaScript or Erlang
- Connection and connection traversal: used as a graph database
- Index: enter metadata for search (available soon in version 1.0)
- Big data object support (Luwak)
- Available in "open source" and "enterprise" versions
- Full text search, index, search by Riak server (beta version)
- Support for Masterless multi-site replication and commercial license SNMP monitoring
Best Application scenarios:Applicable to scenarios where you want to use databases like Cassandra (similar to Dynamo) but cannot handle bloat and complexity. It is applicable to scenarios where you plan to replicate multiple sites, but you need to have requirements on the scalability, availability, and error handling of a single site.
For example, sales data collection, factory control system, strict downtime requirements, can be used as easy to update web servers.
5. Membase
- Languages used: Erlang and C
- Features: Compatible with Memcache, but with both persistence and Cluster Support
- License: Apache 2.0
- Protocol: Distributed Cache and expansion
- Very fast (200 k +/second), data is indexed by key value
- Persistent storage to hard disk
- All nodes are unique (master-master replication)
- Cache units similar to distributed cache are also supported in the memory.
- When writing data, I/O is reduced by removing duplicated data.
- Provides a very good web interface for cluster management
- No need to stop the database service when updating the software
- Supports connection pool and multiplexing connection proxy
Best Application scenarios:Suitable for applications that require low-latency data access, high concurrency support, and high availability
For example, low-latency data access, such as advertisement-oriented applications, and highly concurrent web applications such as online games (such as Zynga)
6. Neo4j
- Language used: Java
- Features: relational graph databases
- License: GPL, some of which use AGPL/commercial license
- Protocol: HTTP/REST (or embedded in Java)
- Java applications can be used or embedded independently.
- Nodes and edges of a graph can contain metadata.
- Good built-in web management function
- Supports path search using multiple algorithms
- Index using key values and relationships
- Optimize read operations
- Support Transactions (using Java APIs)
- Use Gremlin graphic traversal language
- Supports Groovy scripts
- Supports online backup, advanced monitoring, and high reliability. supports the use of AGPL/commercial license.
Best Application scenarios:Applicable to graphics. This is the most significant difference between Neo4j and Other nosql databases.
For example, social relations, public transportation networks, maps, and network extensions
7. Cassandra
- Language used: Java
- Features: best support for large tables and Dynamo
- License: Apache
- Protocol: Custom, binary (conservation-oriented)
- Adjustable distribution and replication (N, R, W)
- Supports column query with key values in a certain range
- Similar to the function of a big table: Column, a feature column set
- Write operations are faster than read operations
- Map/reduce as much as possible based on the Apache distributed platform
- I admit that I am biased against Cassandra, in part because of its bloated and complex nature and Java problems (configuration, exceptions, and so on)
Best Application scenarios:When you use write operations more than read operations (logging), if every system build must be written in Java (no one is fired because Apache software is used)
For example, the banking industry and the financial industry (although not essential for financial transactions, but these industries have higher database requirements) write faster than reading, so a natural feature is real-time data analysis.
8. HBase
(Used with ghshephard)
- Language used: Java
- Features: supports billions of rows X millions of columns
- License: Apache
- Protocol: HTTP/REST (support for Thrift, see note 4)
- Modeling after BigTable
- Distributed architecture Map/reduce
- Optimize real-time queries
- High-performance Thrift Gateway
- Prediction of query operations by scanning and filtering on the server side
- Supports XML, Protobuf, and binary HTTP
- Cascading, hive, and pig source and sink modules
- Jruby-based shell
- Rollback is performed for configuration changes and minor upgrades.
- No SPOF
- Comparable to the random access performance of MySQL
Best Application scenarios:It is applicable to scenarios where BigTable is preferred and big data needs to be accessed randomly and in real time.
For example, Facebook message database (more common use cases are coming soon)
Note: Thrift is an interface definition language that provides definition and creation services for multiple other languages. it is developed and open-source by Facebook.
Of course, all systems not only have these features listed above. Here I only list some important features I think based on my own points of view. At the same time, technological advances are fast, so the above content must be constantly updated. I will do my best to update this list.
Original article: Krist óf ská cs translation: Bole online-programmer
Http://blog.jobbole.com/1344/.