Guide: Kristóf Kovács is a software architect and consultant who recently published an article comparing various types of NoSQL databases.
Although SQL database is a very useful tool, the monopoly is about to be broken after 15 years of a single show. This is only a matter of time: forced to use relational databases, but eventually found to be unable to adapt to the needs of the numerous.
But the difference between NoSQL databases is far more than two of the differences between SQL databases. This means that the Software architect should choose a suitable NoSQL database at the beginning of the project. In this case, Cassandra, Mongodb, CouchDB, Redis, Riak, Membase, neo4j, and HBase are compared:
(Note 1:nosql: A new revolutionary campaign for databases, NoSQL advocates are advocating the use of non-relational data storage.) Today's computer architectures require a huge level of scalability in data storage, and NoSQL is committed to changing this situation. Google's BigTable and Amazon's Dynamo are now using NoSQL databases. See also NoSQL entry. )
1. CouchDB
- Language used: Erlang
- Features: DB consistency, easy to use
- License for use: Apache
- Protocol: Http/rest
- Bidirectional data replication,
- Ongoing or temporary treatment,
- With conflict checking during processing,
- Therefore, a master-master copy is used (see note 2)
- mvcc– write operation does not block read operations
- The version before the file can be saved
- Crash-only (reliable) design
- Data compression needs to be performed at regular intervals
- Views: embedded Mapping/reduction
- Formatted view: List display
- Support for server-side document validation
- Support Certifications
- Real-time updates based on changes
- Support for attachment handling
- Therefore, Couchapps (standalone JS application)
- Requires jquery library
Best Practice: applications that perform pre-defined queries and data statistics for less data changes. Applies to applications that need to provide data versioning support.
For example: CRM, CMS system. Master-master replication is useful for multi-site deployments.
(Note 2:master-master replication: A database synchronization method that allows data to be shared between a group of computers and can be updated within a group by any member of the group.) )
2. Redis
- Language used: C + +
- Features: Fast running abnormally
- License for use: BSD
- Protocol: Class Telnet
- There are memory databases supported by hard disk storage,
- However, the data can be exchanged to the hard drive since version 2.0 (note that the feature is not supported in version 2.4)! )
- Master-slave copy (see note 3)
- Although simple data or hash tables indexed with key values are used, complex operations, such as Zrevrangebyscore, are also supported.
- INCR & Co (suitable for calculating limit values or statistical data)
- Supports sets (also supports Union/diff/inter)
- Support List (also supports queue; blocking pop operations)
- Support for hash tables (objects with multiple domains)
- Support for sorting sets (high score table, for range queries)
- Redis Support Transactions
- Support for setting data to outdated data (similar to fast buffer design)
- Pub/sub allows users to implement message mechanisms
Best Practices: applications where data changes quickly and database sizes are met (for memory capacity).
For example: stock price, data analysis, real-time data collection, real-time communication.
(Note 3:master-slave replication: If only one server handles all replication requests at the same time, this is referred to as Master-slave replication, and is typically applied to a server cluster that needs to provide high availability.) )
3. MongoDB
- Language used: C + +
- Features: Preserves some of the SQL friendly features (queries, indexes).
- License for use: AGPL (initiator: Apache)
- Protocol: Custom, Binary (BSON)
- Master/slave Replication (supports automatic error recovery with sets replication)
- Built-in shard mechanism
- Support for JavaScript expression queries
- Arbitrary JavaScript functions can be executed on the server side
- Update-in-place support is better than couchdb
- Memory-to-file mapping when data is stored
- Performance concerns outweigh the requirements for functionality
- It is recommended to turn on the log function (parameter –journal)
- On 32-bit operating systems, the database size is limited to about 2.5GB
- The empty database accounts for approximately 192Mb
- Use GRIDFS to store big data or metadata (not a real file system)
best-case scenario: applications that require dynamic query support, need to use indexes instead of map/reduce features, require performance requirements for large databases, and apps that need to use couchdb but are full of memory because the data changes too frequently.
For example: you intended to use MySQL or PostgreSQL, but because of their own predefined columns, you are deterred.
4. Riak
- Languages used: Erlang and C, and some JavaScript
- Features: fault-tolerant capability
- License for use: Apache
- Protocol: http/rest or custom binary
- Adjustable distribution and Replication (N, R, W)
- Use JavaScript or Erlang to verify and secure support before or after the operation.
- Using JavaScript or Erlang for Map/reduce
- Connection and connection traversal: can be used as a graphical database
- Index: Enter metadata for Search (1.0 version is about to be supported)
- Big Data Object Support (Luwak)
- Two versions of "Open source" and "Enterprise" available
- Full-Text search, indexing, search server queries via Riak (Beta version)
- SNMP monitoring support for masterless Multi-site replication and commercial licensing
Best Practice: for situations where you want to use a database similar to Cassandra (similar to dynamo) but cannot handle bloat and complexity. Applies to scenarios where you intend to do long site replication, but require scalability, availability, and error handling for a single site.
For example: Sales data collection, plant control system, strict requirements for downtime, and can be used as an easy-to-update Web server.
5. Membase
- Languages used: Erlang and C
- Features: Compatible with Memcache, but both persistent and support clusters
- Usage License: Apache 2.0
- Protocol: Distributed Cache and extension
- Very fast (200k+/seconds), index data by key value
- Persistent storage to hard disk
- All nodes are unique (master-master replication)
- Cache units similar to distributed caches are also supported in memory
- Reduce IO by removing duplicate data when writing data
- Provides a very good cluster Management web interface
- Soft software updates without stopping the database service
- Support connection pooling and multiplexing of connection agents
Best Practices: for applications that require low-latency data access, high concurrency support, and high availability
For example: Low latency data access such as ad-targeted applications, high-concurrency Web applications such as online games (Zynga, for example)
6. neo4j
- Language used: Java
- Features: relational-based graphical database
- License for use: GPL, some of which use agpl/commercial license
- Protocol: Http/rest (or embedded in Java)
- can be used standalone or embedded in Java applications
- The nodes and edges of a graph can have meta data
- Very good self-bring web management function
- Support path search using multiple algorithms
- Index by using key values and relationships
- Optimize for read operations
- Support transactions (with Java API)
- Traversing languages using Gremlin graphics
- Support for Groovy scripting
- Support Online backup, advanced monitoring and high reliability support using agpl/business license
Best Practice: applies to graphs of a class of data. This is the most significant difference between neo4j and other NoSQL databases.
For example: social relations, public transport networks, maps and network extension spectrum
7. Cassandra
- Language used: Java
- Features: Best support for large tables and dynamo
- License for use: Apache
- Protocol: Custom, binary (saving)
- Adjustable distribution and Replication (N, R, W)
- Supports querying columns with a range of key values
- Functions similar to large tables: columns, a collection of columns for an attribute
- Write operations are faster than read operations
- Map/reduce based on Apache distributed platform as much as possible
- I admit to being biased against Cassandra, partly because of its bloated and complex nature, and also because of Java problems (configuration, anomalies, etc.)
Best Practice Scenario: When using write operations to read more (log logs) if each system build must be written in Java (no one is fired for using Apache software)
For example: Banking, finance (although not required for financial transactions, these industries are more likely to require a database than they are) write faster than read, so a natural feature is real-time data analysis
8. HBase
(for use with Ghshephard)
- Language used: Java
- Features: Support billions of rows x millions of columns
- License for use: Apache
- Agreement: Http/rest (Support Thrift, see note 4)
- Modeling after BigTable
- Adopt a distributed Architecture Map/reduce
- Optimize for real-time queries
- High-performance Thrift gateways
- Pre-contract query operations through server-side scanning and filtering
- Supports XML, protobuf, and binary http
- cascading, Hive, and pig source and sink modules
- Shell based on Jruby (JIRB)
- Changes to configuration and minor upgrades will be rolled back
- There is no single point of failure
- Random access performance comparable to MySQL
Best Practice Scenario: for Preference bigtable:) and for random, real-time access to big data.
For example: Facebook message database (more common use cases are about to appear)
Note 4:thrift is an interface definition language that provides definition and creation services for a variety of other languages, developed by Facebook and open source.
Of course, all systems do not only have these features listed above. Here I just list some of the important features I think are based on my own opinion. At the same time, technological progress is rapid, so the above content must be constantly updated. I will do my best to update this list.
Comparison of 8 NoSQL database Systems (RPM)