Comparison of eight mainstream NoSQL Databases

Source: Internet
Author: User
Tags cassandra riak neo4j couchdb
Abstract: Although SQL database is a very useful tool, after 15 years of outstanding performance, the monopoly will be broken. This is only a matter of time: I was forced to use relational databases, but I finally found that I could not meet my needs. For details, see my IT-Homer blog: Comparison of eight mainstream NoSQL databases NoSQL is a brand new revolutionary database operation.

Abstract: Although SQL database is a very useful tool, after 15 years of outstanding performance, the monopoly will be broken. This is only a matter of time: I was forced to use relational databases, but I finally found that I could not meet my needs. For details, see my IT-Homer blog: Comparison of eight mainstream NoSQL databases NoSQL is a brand new revolutionary database operation.

Summary: Although SQL database is a very useful tool, after 15 years of outstanding performance, the monopoly will be broken. This is only a matter of time: I was forced to use relational databases, but I finally found that I could not meet my needs. For details, see my IT-Homer blog: Comparison of eight mainstream NoSQL Databases
Introduction
NoSQL is a revolutionary new database movement. NoSQL advocates the use of non-relational data storage. Today's computer architecture requires massive horizontal scalability in terms of data storage, while NoSQL is committed to changing this situation. Currently, Google BigTable and Amazon Dynamo use NoSQL databases.
However, the difference between NoSQL databases is far greater than that between SQL databases. This means that software architects should select a suitable NoSQL database at the beginning of the project.
To address this problem, we compared Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, Neo4j, and HBase: 1. CouchDB language: Erlang
Features: DB consistency, easy to use
License: Apache
Protocol: HTTP/REST
Bidirectional data replication
Continuous or temporary handling
Conflict check during processing
Therefore, master-master replication is used (see note 2)
MVCC-write operations do not block read Operations
Versions earlier than files can be saved
Crash-only (reliable) Design
Data Compression from time to time
View: Embedded ing/reduction
Format view: list display
Supports server-side document verification
Authentication supported
Real-time update based on changes
Support attachment processing
Therefore, CouchApps (independent js applications)
JQuery library required
Master-master replication is a database synchronization method that allows data to be shared among a group of computers and can be updated by any group member in the group.

Best application scenarios: Applicable to applications with less data changes, execution of pre-defined queries, and data statistics. Applicable to applications that require data version support.

For example, CRM and CMS systems. Master-master replication is very useful for multi-site deployment.

2. Redis language: C/C ++
Features: Fast Running exceptions
License: BSD
Protocol: Telnet-like
Memory databases supported by hard disk storage,
However, data can be exchanged to the hard disk after version 2.0 (Note: Versions later than version 2.4 do not support this feature !)
Master-slave replication (see appendix 3)
Although simple data or hash tables indexed by key value are used, complex operations such as ZREVRANGEBYSCORE are also supported.
INCR & co (suitable for calculating limit values or statistical data)
Sets are supported (union, diff, and inter are also supported)
Support List (queue and blocking pop operations are also supported)
Support for hash tables (objects with multiple domains)
Supports sorting sets (high-score tables, applicable to range queries)
Redis supports transactions
Supports setting data to expired data (similar to the fast Buffer Design)
Pub/Sub allows users to Implement Message mechanisms
Master-slave replication. If only one server processes all replication requests at a time, the application usually needs to provide high-availability server clusters.

Best application scenarios: Suitable for applications with fast data changes and database size (suitable for memory capacity.

For example, stock price, data analysis, real-time data collection, and real-time communication.

3. MongoDB language: C ++
Features: it retains some user-friendly features (queries and indexes) of SQL ).
License: AGPL (initiator: Apache)
Protocol: Custom, binary (BSON)
Master/slave replication (Supports automatic error recovery and sets replication)
Built-in sharding Mechanism
Support for javascript expression Query
Attackers can execute arbitrary javascript Functions on the server.
Update-in-place support is better than CouchDB
Use memory-to-file ing for data storage
The focus on performance exceeds the functional requirements
We recommend that you enable the log function (parameter-journal)
On a 32-bit operating system, the database size is limited to approximately 2.5 Gb.
Empty databases account for about 192 Mb
Use GridFS to store big data or metadata (not a real File System)

Best application scenarios: It is applicable to applications that require dynamic query support, which require indexes instead of map/reduce functions, performance requirements for large databases, and applications that require CouchDB but are full of memory due to frequent data changes.

For example, you are planning to use MySQL or PostgreSQL, but the pre-defined columns that come with them will block you.

4. Riak

Languages used: Erlang and C, and some Javascript

Features: Fault Tolerance
License: Apache
Protocol: HTTP/REST or custom binary
Adjustable distribution and replication (N, R, W)
Use JavaScript or Erlang for verification and security support before or after the operation.
Map/reduce using JavaScript or Erlang
Connection and connection traversal: Used as a graph database
Index: Enter metadata for search (available soon in version 1.0)
Big Data Object support (Luwak)
Available in "Open Source" and "Enterprise" versions
Full text search, index, search by Riak server (beta version)
Support for Masterless multi-site replication and commercial license SNMP monitoring

Best application scenarios: Applicable to scenarios where you want to use databases like Cassandra (similar to Dynamo) but cannot handle bloat and complexity. It is applicable to scenarios where you plan to replicate multiple sites, but you need to have requirements on the scalability, availability, and error handling of a single site.

For example, sales data collection, factory control system, strict downtime requirements, can be used as easy to update web servers.

5. Membase languages: Erlang and C
Features: compatible with Memcache, but with both persistence and cluster support
License: Apache 2.0
Protocol: distributed cache and Expansion
Very fast (200 k +/second), data is indexed by key value
Persistent storage to Hard Disk
All nodes are unique (master-master replication)
Cache Units similar to distributed cache are also supported in the memory.
When writing data, I/O is reduced by removing duplicated data.
Provides a very good web interface for cluster management
No need to stop the Database Service when updating the software
Supports connection pool and multiplexing connection proxy

Best application scenarios: Suitable for applications that require low-latency data access, high concurrency support, and high availability

For example, low-latency data access, such as advertisement-oriented applications, and highly concurrent web applications such as online games (such as Zynga)

6. Neo4j language: Java
Features: Relational Graph Databases
License: GPL, some of which use AGPL/commercial license
Protocol: HTTP/REST (or embedded in Java)
Java applications can be used or embedded independently.
Nodes and edges of a graph can contain metadata.
Good built-in web Management Function
Supports path search using multiple algorithms
Index using key values and relationships
Optimize read Operations
Support transactions (using Java APIs)
Use Gremlin graphic traversal Language
Supports Groovy scripts
Supports online backup, advanced monitoring, and high reliability. Supports the use of AGPL/commercial license.

Best application scenarios: Applicable to graph data. This is the most significant difference between Neo4j and other nosql databases.

For example, social relations, public transportation networks, maps, and network extensions

7. Cassandra language: Java
Features: best support for large tables and Dynamo
License: Apache
Protocol: Custom, binary (conservation-oriented)
Adjustable distribution and replication (N, R, W)
Supports column query with key values in a certain range
Similar to the function of a big table: column, a feature column set
Write operations are faster than read Operations
Map/reduce as much as possible based on the Apache Distributed Platform
Biased against Cassandra, in part because of its bloated and complex nature and Java problems (configuration, exceptions, and so on)

Best application scenarios: When you use write operations more than read Operations (record logs), if each system build must be written in Java (no one is fired because Apache software is used)

For example, the banking industry and the financial industry (although not essential for financial transactions, but these industries have higher Database Requirements) write faster than reading, so a natural feature is real-time data analysis.

8. HBase (used with ghshephard) language: Java
Features: supports billions of rows X millions of Columns
License: Apache
Protocol: HTTP/REST (support for Thrift, see note 4)
Modeling After BigTable
Distributed architecture Map/reduce
Optimize real-time queries
High-performance Thrift Gateway
Prediction of query operations by scanning and filtering on the server side
Supports XML, Protobuf, and binary HTTP
Cascading, hive, and pig source and sink modules
Jruby-based shell
Rollback is performed for configuration changes and minor upgrades.
No spof
Comparable to the random access performance of MySQL

Best application scenarios: Applicable to scenarios where BigTable is preferred and big data needs to be accessed randomly and in real time.

For example, Facebook message database (more common use cases are coming soon)

Thrift is an interface definition language that provides definition and creation services for multiple other languages, developed and open-source by Facebook.

Of course, all systems not only have these features listed above. Here, we will only list some important features based on our own opinions. At the same time, technological advances are fast, so the above content must be constantly updated.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.