Guide: Kristóf Kovács is a software architect and consultant who recently published an article comparing various types of NoSQL databases. Although SQL database is a very useful tool, the monopoly is about to be broken after 15 years of a single show. This is only a matter of time: forced to use relational databases, but eventually found to be unable to adapt to the needs of the numerous. But the difference between NoSQL databases is far more than two of the differences between SQL databases. This means that the Software architect should choose a suitable NoSQL database at the beginning of the project. In this case, Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, neo4j, and HBase are compared: (note 1:nosql: It's a whole new database revolutionary movement, NoSQL advocates are advocating the use of non-relational data storage. Today's computer architectures require a huge level of scalability in data storage, and NoSQL is committed to changing this situation. Google's BigTable and Amazon's Dynamo are now using NoSQL databases. See also NoSQL entry. )1. Couchdb Language: Erlang features: DB consistency, ease of use • License: Apache Protocol: HTTP/rest bidirectional data replication, • Continuous or temporary processing, with conflict checking during processing • Therefore, the master is used-master replication (see note 2) mvcc– writes do not block read operations • Versions prior to saving the file crash-Only (reliable) Design • Data compression/view: Embedded mapping/Reduced • Formatted view: List display • Support for server-side document validation • Support Certification • Real-time updates based on changes • Support for attachment handling • Therefore, Couchapps (standalone JS application) • Requires the jquery library best practice: Suitable for less data changes, An application that performs a pre-defined query for data statistics. Applies to applications that need to provide data versioning support. For example: CRM, CMS system. Master-master replication is useful for multi-site deployments. (Note 2:master-Master replication: is a database synchronization method that allows data to be shared between a group of computers and can be updated within a group by any member of the group. )2. Redis language: C/c++• Features: very fast operation • License: BSD protocol: class Telnet has a memory database supported by hard disk storage, • Data can be exchanged to the hard disk since version 2.0 (note that2This feature is not supported in versions after 4. ) Master-slave copy (see note 3) • Complex operations such as Zrevrangebyscore are supported, although simple data or a hash table indexed with key values is used. incr&Co (suitable for calculating limit values or statistics) • Supports sets (also supports Union/diff/Inter) • Support list (also supports queue; blocking pop operations) • Supports hash tables (objects with multiple domains) • Support for sorting sets (high score table, for range queries) Redis Support transactions • Support for setting data to outdated data (similar to fast buffer design) Pub /Sub allows the user to implement the best scenario for messaging mechanisms: Applications that are fast-changing data and have a database size that can be met (for memory capacity). For example: stock price, data analysis, real-time data collection, real-time communication. (Note 3:master-slave replication: If only one server processes all replication requests at the same time, this is known as master-slave replication, typically applied to a server cluster that needs to provide high availability. )3. MongoDB language: C++• Features: Preserves some of the SQL friendly features (queries, indexes). • License to use: AGPL (initiator: Apache) • Protocol: Custom, Binary (BSON) Master/slave replication (supports automatic error recovery with sets replication) • Built-in shard mechanism • Support for JavaScript expression queries • Arbitrary JavaScript functions can be executed on the server side update-inch-Place support is better than couchdb • Memory-to-file mapping when data is stored • Performance concerns exceed requirements for features • It is recommended to turn on the log function (parameter –journal) • On 32-bit operating systems, the database size limit is approximately 2.5GB an empty database 192MB uses GRIDFS to store big data or metadata (not a real file system) Best practice: For dynamic query support; you need to use an index instead of a map/reduce functionality, need to have performance requirements for large databases, applications that need to use couchdb but are full of memory because the data changes too frequently. For example: you intended to use MySQL or PostgreSQL, but because of their own predefined columns, you are deterred. 4. Riak languages: Erlang and C, and some JavaScript features: Fault tolerance • Use license: Apache Protocol: HTTP/rest or custom binary adjustable distribution and Replication (N, R, W) • Verify and secure support before or after operation with JavaScript or Erlang. • Map with JavaScript or Erlang/reduce connection and connection traversal: can be used as a graphical database • Index: Input metadata for search (1. 0 version is about to be supported) • Big Data Object Support (Luwak) • Provide "open source" and "Enterprise" two versions • Full-text search, indexing, search server queries via Riak (Beta) • SNMP monitoring that supports masterless multi-site replication and commercial licensing Best practice: For situations where you want to use a database similar to Cassandra (similar to dynamo) but cannot handle bloat and complexity. Applies to scenarios where you intend to do long site replication, but require scalability, availability, and error handling for a single site. For example: Sales data collection, plant control system, strict requirements for downtime, and can be used as an easy-to-update Web server. 5. Membase languages: Erlang and C. Features: Compatible with Memcache, but both persistent and supported cluster • License: Apache2.0• Protocol: Distributed cache and Expansion • Very fast (200k+/seconds), index data by key value • Persistent storage to hard disk • All nodes are unique (master-master replication) • Support for cache units similar to distributed caches in memory • Write data by removing duplicate data to reduce IO provides a very good cluster management web interface • Software updates without stopping the database service • Connection pooling and multiplexing connection agents Best practices: For applications that require low-latency data access, high concurrency support, and high availability for example: low latency data access such as ad-targeted applications, high-concurrency Web applications such as online games (Zynga, for example)6. neo4j Language: Java features: relational-based graphical database • License: GPL, some of which use AGPL/Commercial License/protocol: HTTP/REST (or embedded in Java) • Nodes and edges that can be used independently or embedded in Java applications • Graphics can have metadata • Good self-contained web management • Use multiple algorithms to support path search • Index with key values and relationships • Optimize for read operations • Support transactions (with Java API) • Traverse languages with Gremlin graphics • Support Groovy scripting • Support for online backup, advanced monitoring and high reliability support using AGPL/Best Practice for Business licensing: Applies to graphs of a class of data. This is the most significant difference between neo4j and other NoSQL databases such as: social relations, public transport networks, maps and network extension spectrum7. Cassandra Language: Java features: the best support for large tables and Dynamo • License: Apache Protocol: Custom, binary (economical) • Adjustable distribution and replication (N, R, W) • Supports query by column with a range of key values · Functions similar to large tables: columns, column sets for an attribute • Write operations are faster than read operations • Map as much as possible based on Apache distributed platform/reduce I admit to being biased against Cassandra, in part because of its bloated and complex nature, and also because of the best scenario for Java (configuration, exception, etc.): When using write operations with multiple read operations (logging) If each system build must be Java Authoring (no one is fired for using Apache software) For example: Banking, finance (although not required for financial transactions, these industries are more likely to require a database than they are) write faster than read, so a natural feature is real-time data analysis8. HBase (for use with Ghshephard) • Language: Java features: supports billions of rows x millions of columns • License: Apache Protocol: HTTP/REST (Support Thrift, see note 4) • Modeling after bigtable • Using a distributed architecture Map/reduce optimized for real-time queries • High performance Thrift gateways • Pre-award query operations through server-side scanning and filtering • Support for XML, PROTOBUF, and binary http cascading, hive, and pig Source and sink modules Jruby (JIRB) based shell on configuration changes and minor upgrades will be rolled back • No single point of failure • Best scenario for random access performance comparable to MySQL: for preference bigtable:) And the need for large data random, real-time access to the occasion. For example: Facebook message database (more common use cases coming soon) Note 4:thrift is an interface definition language that provides definition and creation services for a variety of other languages, developed by Facebook and open source. Of course, all systems do not only have these features listed above. Here I just list some of the important features I think are based on my own opinion. At the same time, technological progress is rapid, so the above content must be constantly updated. I will do my best to update this list.
Comparison of 8 NoSQL database systems