Comparison of eight mainstream NoSQL database systems (RPM)

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Source: http://database.51cto.com/art/201109/293029.htm

Although SQL database is a very useful tool, the monopoly is about to be broken after 15 years of a single show. This is only a matter of time: forced to use relational databases, but eventually found to be unable to adapt to the needs of the numerous.

But the difference between NoSQL databases is far more than two of the differences between SQL databases. This means that the Software architect should choose a suitable NoSQL database at the beginning of the project. In this case, Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, neo4j, and hbase are compared:

(Note 1:nosql: A new revolutionary campaign for databases, NoSQL advocates are advocating the use of non-relational data storage.) Today's computer architectures require a huge level of scalability in data storage, and NoSQL is committed to changing this situation. Google's BigTable and Amazon's Dynamo are now using NoSQL databases. See also NoSQL entry. )

1. CouchDB

Language used: Erlang
Features: DB consistency, easy to use
License for use: Apache
Protocol: Http/rest
Bidirectional data replication
Ongoing or temporary processing
Handling with conflict checking
Therefore, a master-master copy is used (see note 2)
MVCC-write operation does not block read operations
The version before the file can be saved
Crash-only (reliable) design
Data compression needs to be performed at regular intervals
Views: embedded Mapping/reduction
Formatted view: List display
Support for server-side document validation
Support Certifications
Real-time updates based on changes
Support for attachment handling
Therefore, Couchapps (standalone JS application)
Requires jquery library

Best Practice: Applications that perform pre-defined queries and data statistics for less data changes. Applies to applications that need to provide data versioning support.

For example: CRM, CMS system. Master-master replication is useful for multi-site deployments.

(Note 2:master-master replication: A database synchronization method that allows data to be shared between a group of computers and can be updated within a group by any member of the group.) )

2. Redis

Language used: C + +
Features: Fast running abnormally
License for use: BSD
Protocol: Class Telnet
There are memory databases supported by hard disk storage,
However, the data can be exchanged to the hard drive since version 2.0 (note that this feature is not supported in version 2.4).
Master-slave copy (see note 3)
Although simple data or hash tables indexed with key values are used, complex operations, such as Zrevrangebyscore, are also supported.
INCR & Co (suitable for calculating limit values or statistical data)
Supports sets (also supports Union/diff/inter)
Support List (also supports queue; blocking pop operations)
Support for hash tables (objects with multiple domains)
Support for sorting sets (high score table, for range queries)
Redis Support Transactions
Support for setting data to outdated data (similar to fast buffer design)
Pub/sub allows users to implement message mechanisms

Best practices: Applications where data changes quickly and database sizes are met (for memory capacity).

For example: stock price, data analysis, real-time data collection, real-time communication.

(Note 3:master-slave replication: If only one server handles all replication requests at the same time, this is referred to as Master-slave replication, and is typically applied to a server cluster that needs to provide high availability.) )

3. MongoDB

Language used: C + +
Features: Preserves some of the SQL friendly features (queries, indexes).
License for use: AGPL (initiator: Apache)
Protocol: Custom, Binary (BSON)
Master/slave Replication (supports automatic error recovery with sets replication)
Built-in shard mechanism
Support for JavaScript expression queries
Arbitrary JavaScript functions can be executed on the server side
Update-in-place support is better than couchdb
Memory-to-file mapping when data is stored
Performance concerns outweigh the requirements for functionality
It is recommended to turn on the log function (parameter--journal)
On 32-bit operating systems, the database size is limited to about 2.5GB
The empty database accounts for approximately 192Mb
Use GRIDFS to store big data or metadata (not a real file system)

Best-case scenario: Applications that require dynamic query support, need to use indexes instead of map/reduce features, require performance requirements for large databases, and apps that need to use couchdb but are full of memory because the data changes too frequently.

For example: you intended to use MySQL or PostgreSQL, but because of their own predefined columns, you are deterred.

4. Riak

Languages used: Erlang and C, and some JavaScript

Features: fault-tolerant capability
License for use: Apache
Protocol: http/rest or custom binary
Adjustable distribution and Replication (N, R, W)
Use JavaScript or Erlang to verify and secure support before or after the operation.
Using JavaScript or Erlang for Map/reduce
Connection and connection traversal: can be used as a graphical database
Index: Enter metadata for Search (1.0 version is about to be supported)
Big Data Object Support (Luwak)
Two versions of "Open source" and "Enterprise" available
Full-Text search, indexing, search server queries via Riak (Beta version)
SNMP monitoring support for masterless Multi-site replication and commercial licensing

Best practice: For situations where you want to use a database similar to Cassandra (similar to dynamo) but cannot handle bloat and complexity. Applies to scenarios where you intend to do long site replication, but require scalability, availability, and error handling for a single site.

For example: Sales data collection, plant control system, strict requirements for downtime, and can be used as an easy-to-update Web server.

5. Membase

Languages used: Erlang and C
Features: Compatible with Memcache, but both persistent and support clusters
Usage License: Apache 2.0
Protocol: Distributed Cache and extension
Very fast (200k+/seconds), index data by key value
Persistent storage to hard disk
All nodes are unique (master-master replication)
Cache units similar to distributed caches are also supported in memory
Reduce IO by removing duplicate data when writing data
Provides a very good cluster Management web interface
Soft software updates without stopping the database service
Support connection pooling and multiplexing of connection agents

Best practices: For applications that require low-latency data access, high concurrency support, and high availability

For example: Low latency data access such as ad-targeted applications, high-concurrency Web applications such as online games (Zynga, for example)

6. neo4j

Language used: Java
Features: relational-based graphical database
License for use: GPL, some of which use agpl/commercial license
Protocol: Http/rest (or embedded in Java)
can be used standalone or embedded in Java applications
The nodes and edges of a graph can have meta data
Very good self-bring web management function
Support path search using multiple algorithms
Index by using key values and relationships
Optimize for read operations
Support transactions (with Java API)
Traversing languages using Gremlin graphics
Support for Groovy scripting
Support Online backup, advanced monitoring and high reliability support using agpl/business license

Best Practice: Applies to graphs of a class of data. This is the most significant difference between neo4j and other NoSQL databases.

For example: social relations, public transport networks, maps and network extension spectrum

7. Cassandra

Language used: Java
Features: Best support for large tables and dynamo
License for use: Apache
Protocol: Custom, binary (saving)
Adjustable distribution and Replication (N, R, W)
Supports querying columns with a range of key values
Functions similar to large tables: columns, a collection of columns for an attribute
Write operations are faster than read operations
Map/reduce based on Apache distributed platform as much as possible
I admit to being biased against Cassandra, partly because of its bloated and complex nature, and also because of Java problems (configuration, anomalies, etc.)

Best Practice scenario: When using write operations to read more (log logs) if each system build must be written in Java (no one is fired for using Apache software)

For example: Banking, finance (although not required for financial transactions, these industries are more likely to require a database than they are) write faster than read, so a natural feature is real-time data analysis

8. HBase

(for use with Ghshephard)

Language used: Java
Features: Support billions of rows x millions of columns
License for use: Apache
Agreement: Http/rest (Support Thrift, see note 4)
Modeling after BigTable
Adopt a distributed Architecture Map/reduce
Optimize for real-time queries
High-performance Thrift gateways
Pre-contract query operations through server-side scanning and filtering
Supports XML, protobuf, and binary http
cascading, Hive, and pig source and sink modules
Shell based on Jruby (JIRB)
Changes to configuration and minor upgrades will be rolled back
There is no single point of failure
Random access performance comparable to MySQL

Best practice scenario: for Preference BigTable:) and for random, real-time access to big data.

For example: Facebook message database (more common use cases are about to appear)

Note 4:thrift is an interface definition language that provides definition and creation services for a variety of other languages, developed by Facebook and open source.

Of course, all systems do not only have these features listed above. Here I just list some of the important features I think are based on my own opinion. At the same time, technological progress is rapid, so the above content must be constantly updated. I will do my best to update this list.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Comparison of eight mainstream NoSQL database systems (RPM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Comparison of eight mainstream NoSQL database systems (RPM)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support