Nosql database Cassandra vs MongoDB vs couchdb vs redis vs Riak vs hbase comparison

Source: Internet
Author: User
Tags cassandra riak couchdb stock prices

Reprinted: http://hi.baidu.com/yandavid/blog/item/04f0d1952850ab52d1135e94.html

Several important theories in the nosql world

1. Cap Theory

Cap theory is undoubtedly the most important reason for the transformation from a relational database system to a nosql system.

CAP (consistency, availability, patition tolerance) theory is that in any distributed system, only consistency, availability, and partition adequacy can be satisfied. So you don't have to spend time on how to satisfy all three.

The proof of the principle is simply that consistency and availability cannot be achieved at the same time to ensure partition adequacy. High Consistency has to sacrifice availability, and high availability has to sacrifice availability. (Why ensure partition adequacy? As network applications are growing, data partitioning is a basic requirement)

Proof process: brewer's cap Theorem

2. Consistent hash

This is not much to say. People who have used MC should be clear about it, directly:

3. mapreduce

Mapreduce is divided into two parts: map and reduce. In short, map refers to sharding large amounts of computing for Parallel Computing. Reduce refers to combining the results of parallel computing, to get a final output.

For more details, see Wikipedia: mapreduce

Google mapreduce PDF: mapreduce: simplified data processing on large clusters

4. Gossip

Gossip is a theory applied to P2P (not the popular Gossip Girl [Gossip Girl]), the main process is to communicate with all other N-1 nodes through each node in an N-node cluster to achieve data synchronization, gossip does not require the cluster to have a master, the change of a node can be transmitted to all other nodes through virus transmission, and the cost of adding or reducing a node is almost zero.

For more details, see Wikipedia: gossip

 

Although SQL databases have been dominant for 15 years, it is time to end. This is only a matter of time. Today, nosql is booming. However, every product has its own characteristics, and its strengths and disadvantages are not suitable for such scenarios. This article analyzes Cassandra, MongoDB, couchdb, redis, Riak, and hbase in many aspects.

Couchdb uses the development language Erlang, which follows the Apache license and uses the HTTP/rest protocol. The main advantage is that data consistency and ease of use can be maintained while multi-site deployment is allowed. Couchdb is mainly applicable to applications that accumulate and rarely change data. Such as CRM and CMS systems.

Redis uses the C/C ++ development language. It adopts the telnet-like protocol according to the BSD license. The main advantage is extremely fast. Redis is mainly applicable to applications that frequently change dataset data. However, the memory usage is large. It is mainly used in financial institutions, real-time analysis, real-time data collection, and real-time communication.

MongoDB uses the C ++ development language, complies with agpl (Drivers: Apache), and uses the custom and binary (bson) protocols. MongoDB is suitable for dynamic queries, and the definition index is more efficient than MAP/reduce. But like couchdb, its data changes a lot and requires a large disk capacity. MongoDB can be used in any MySQL/PostgreSQL environment.

Cassandra uses Java as the development language and follows Apache. It uses the custom and binary (thrift) protocols. Cassandra is suitable for industries that require real-time data analysis, such as banks and financial industries, when writing more data than querying data.

Riak uses the development languages Erlang & C and JavaScript. Follow Apache and use HTTP/rest protocol. Riak is highly fault tolerant. Riak is very similar to Cassandra. Riak is a good choice for high scalability and fault tolerance. However, you are charged for deploying multiple sites. Riak is applicable to sales data entry, industrial control systems, and other scenarios where downtime is not allowed.

Hbase uses Java as the development language, follows Apache, and uses HTTP/rest protocol. Hbase supports up to billions of columns. If you love bigtable and need a database that provides random real-time read/write access to your massive data, hbase is a good choice. Hbase is currently used by the Facebook email database.

 

Couchdb Written in:Erlang Main Point:DB consistency, consistency of use
License:Apache Protocol:HTTP/rest bi-directional (!) Replication, continuous or ad-hoc, with conflict detection, thus, master-master replication .(!) MVCC-write operations do not block reads previous versions of statements
Are available crash-only (reliable) design needs compacting from time to time views: Embedded MAP/reduce formatting views: lists & shows server-side document validation Possible Authentication possible real-time updates via _ changes (!) Attachment handling
Thus, couchapps (standalone JS Apps) jquery library encoded DED

Best used:For accumulating, occasionally changing data, on which pre-defined queries are to be run. Places where Versioning is important.

For example:CRM, CMS systems. Master-master replication is an especially interesting feature, allowing easy multi-site deployments.

Redis Written in:C/C ++ Main Point:Blazing fast
License:BSD Protocol:Telnet-like disk-backed in-memory database, but since 2.0, it can swap to disk. Master-slave replication simple keys and values,
Complex operations like zrevrangebyscore incr & Co (good for Rate limiting or statistics) has sets (also Union/diff/Inter) has lists (also a queue; blocking pop) has hashes (Objects of multiple fields) of all these databases,
Only redis does transactions (!) Values can be set to expire (as in a cache) Sorted sets (high score table, good for range queries) pub/sub and watch on data changes (!)

Best used:For rapidly changing data with a foreseeable database size (shocould fit mostly in memory ).

For example:Stock prices. analytics. Real-time Data Collection. Real-time communication.

MongoDB Written in:C ++ Main Point:Retains some friendly properties of SQL. (query, index)
License:Agpl (Drivers: Apache) Protocol:Custom, binary (bson) Master/Slave replication queries are Javascript expressions run arbitrary JavaScript Functions server-side better update-in-place than couchdb sharding built-in
Uses memory mapped files for data storage performance over features after crash, it needs to repair tables better durablity coming in v1.8

Best used:If you need dynamic queries. if you prefer to define indexes, not map/reduce functions. if you need good performance on a big dB. if you wanted couchdb, but your data changes too much, filling up disks.

For example:For all things that you wowould do with MySQL or PostgreSQL, but having predefined columns really holds you back.

Cassandra Written in:Java Main Point:Best of bigtable and Dynamo
License:Apache Protocol:Custom, binary (thrift) Tunable trade-offs for distribution and replication (n, R, W) Querying by column, range of keys bigtable-like features: columns, column families writes are much faster
Reads (!) MAP/reduce possible with Apache hadoop I admit being a bit biased against it, because of the bloat and complexity it has partly because of Java (configuration, seeing exceptions, etc)

Best used:When you write more than you read (logging). If every component of the system must be in Java. ("no one gets fired for choosing Apache's stuff .")

For example:Banking, Financial Industry (though not necessarily for financial transactions, but these industries are much bigger than that.) writes are faster than reads, so one natural niche is real time data analysis.

Riak Written in:Erlang & C, some Javascript Main Point:Fault Tolerance
License:Apache Protocol:HTTP/rest tunable trade-offs for distribution and replication (n, R, W) pre-and post-commit hooks, for validation and security. built-in full-text search MAP/reduce in Javascript or Erlang comes in
"Open Source" and "Enterprise" editions

Best used:If you want something Cassandra-like (Dynamo-like), but no way you're gonna deal with the bloat and complexity. if you need very good single-site scalability, availability and fault-tolerance, but you're ready to pay for multi-site
Replication.

For example:Point-of-sales data collection. factory control systems. Places where even seconds of downtime hurt.

Hbase

(With the help of ghshephard)

Written in:JavaMain Point:Billions of rows x millions of Columns
License:ApacheProtocol:HTTP/rest (also thrift) modeled after bigtable MAP/reduce with hadoop query predicate push down via server side scan and get filters optimizations for real time queries a high performance thrift Gateway
HTTP supports XML, protobuf, and binary cascading, hive, and pig Source and Sink modules jruby-based (jirb) shell no single point of failure rolling Restart for configuration changes and minor upgrades random access performance is like MySQL

Best used:If you're in love with bigtable. And when you need random, realtime read/write access to your big data.

For example:Facebook messaging database (more general example coming soon)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.