This article has translated three articles from 35 + UseCasesForChoosingYourNextNoSQLDatabase: WhatTheHeckAreYouActuallyUsingNoSQLFor ?. 101QuestionsToAskWhenConsideringANoSQLDatabase. What NoSQL
This article is translated from 35 + Use Cases For Choosing Your Next NoSQL Database
There are three previous articles
What The Heck Are You Actually Using NoSQL ?.
101 Questions To Ask When Considering A NoSQL Database.
What shoshould I Do? Choosing SQL, NoSQL or Both for Scalable Web Applications.
Now, we can consider from the perspective of various use cases that the system is suitable for these use cases.
What is your opinion?
First, we need to look at various data models. The classification methods of these models come from Emil Eifrem and NoSQL databases.
Document Database
- Source: Inspired by Lotus Notes.
- Data model: a set of documents containing key-value
- Example: CouchDB, MongoDB
- Advantages: data models are natural, programming-friendly, fast development, web-friendly, and CRUD.
Graph Database
- Source: Euler and graph theory.
- Data Model: nodes and relationships. you can also process key-value pairs.
- Example: AllegroGraph, InfoGrid, and Neo4j
- Advantage: solves complex graph problems.
Relational database
- Source: proposed by E. F. Codd in A Relational Model of Data for Large Shared Data Banks
- Data Model: various relationships
- Example: VoltDB, Clustrix, MySQL
- Advantages: high-performance, scalable OLTP, support for SQL, materialized views, support for transactions, and friendly programming.
Object Database
- Source: Graph database research
- Data Model: Object
- Example: objecti.pdf, Gemstone
- Advantages: complex object models, fast key-value access, key-function access, and graph database advantages.
Key-Value database
- Source: Amazon papers Dynamo and Distributed HashTables.
- Data Model: Key-value pairs
- Example: Membase and Riak
- Advantage: it can process a large amount of data and quickly process a large number of read/write requests. Friendly programming.
BigTable databases
- Source: Google's paper BigTable.
- Data Model: column clusters. each row is theoretically different.
- Example: HBase, Hypertable, Cassandra
- Advantages: it processes a large amount of data, responds to extremely high write loads, and is highly available. it supports cross-data centers and MapReduce.
Data structure Service
- Source :?
- Data Model: Dictionary operations, lists, sets, and string values
- Example: Redis
- Advantage: different from any previous database
Grid database
- Source: research on data grids and metadata sets.
- Data Model: space-based architecture
- Example: GigaSpaces, Coherence
- Advantage: suitable for high performance and high scalability of transaction processing
What should your application use?
- The key is to realize that different applications require different data models and products. Select an appropriate data model and product.
- To understand What data models your application needs, see What The Heck Are You Actually Using NoSQL? In this article, I have summarized some unconventional application scenarios with different characteristics.
- Adapt to your needs and application scenarios. In turn, you can find the product that best suits your architecture. NoSQL and SQL are not important.
- Consider data models, product features, and application scenarios. Different products have different functions. it is impossible to choose who to choose based on the data model.
- Which product has the characteristics you need most and which is the best.
Assume that your application has the following requirements:
- Complex things. if you cannot afford the risk of data loss or want a simple transaction programming model, you can choose relational databases and grid databases.
- Example: an inventory system requires complete ACID features. If I was told that I had sold out something after I bought it, I would be very unhappy. I don't want compensation. I only need to buy something.
- Scalability: NoSQL or SQL is acceptable. Target products must support horizontal scaling, partitioning, online addition and deletion of hardware, server load balancer, automatic partitioning, data balancing, and fault tolerance.
- Databases that require high availability and support eventual consistency, such as those of the Bigtable type.
- To process long-term fast read/write operations, you can check the document database, Key-value database, or memory database, and consider SSD.
- To achieve a social network, the first choice should be graph database. Second, relational databases such as Riak can also be used. A memory relational database supporting simple SQL join operations can handle a small amount of data. This is the case for Redis 'set and list operations.
Assume that your application has the following requirements:
- If you need different access methods and data types, you can check the document database. they are flexible in this regard.
- For offline analysis of large data volumes, Hadoop should be considered first, followed by other products that support MapReduce. Of course, supporting MapReduce is not the same as being good at MapReduce processing.
- If you need to span multiple data centers, you can choose products based on the Bigtable model or their distributed products that can solve latency and partition fault tolerance problems.
- CRUD-type applications can consider document databases, so that they can access complex data structures without the need for join.
- Riak can be considered for search.
- If you need data structures such as lists, sets, queues, and publish-subscribe, you can consider Redis and its distributed locks and other features.
- Friendly Programming. if you want to use JSON, HTTP, REST, Javascript, and other data types that programmers like, the first choice is the document database and Key-value database.
Assume that your application has the following requirements:
- The Materialized view for real-time transaction processing, which can be considered VoltDB, is very suitable for fast processing of a large number of transactions.
- Enterprise-level support and service-level protocols can be used to search for products sold on the market, such as Membase.
- To record a large amount of continuous data without high consistency requirements, you can look at Bigtable databases because they work on distributed file systems and can process large-scale write requests.
- It should be as simple as possible. Consider the PAAS solution. you do not need to do anything yourself with this solution.
- If your products are to be sold to enterprise customers, consider relational databases because they are used to relational databases.
- To dynamically build relationships between objects, attributes of objects can be dynamically added or subtracted. you can consider graph databases because they do not need schema and can be modeled on demand in code.
- To support large audio and video files, you can look at storage services such as S3. NoSQL is not suitable for storing BLOBS, although MongoDB also provides file services.
Assume that your application has the following requirements:
- To quickly upload large amounts of data in batches, you need to find products that support such scenarios. However, most products do not support batch operations.
- Easy to change. select a document database and a Key-value database that supports dynamic schema. It supports optional fields and adds or reduces fields without modifying the schema.
- To support integrity constraints, you can select a database that supports SQL DDL in the stored procedure or application code.
- Deep connection uses graph database, which supports fast locating between entity keys.
- To keep computation close to data and reduce the overhead of data transmission over the network, you can consider the stored procedure. Relational databases, network databases, document databases, and Key-value databases support stored procedures.
Assume that your application has the following requirements:
- To store BLOB data, select the Key-value database. It can store webpages or complex objects. The latter can be obtained through join in relational databases, which is costly. It can also reduce latency.
- Select a proven product and select a common solution (vertical scaling, tuning, caching, data sharding, anti-paradigm, etc.) when dealing with scalability problems)
- Variable data types, irregular data, unfixed columns, complex data structures, etc. consider document databases, Key-value databases, and Bigtable databases. Their data types are flexible.
- If you need fast link query but do not want to implement it on your own, select a database that supports SQL.
- It can operate on the cloud and automatically take advantage of all the features and benefits of the cloud.
Assume that your application has the following requirements:
- Secondary indexes are supported. different keys are used for retrieval. you can consider relational databases and Cassandra. The latter adds support for secondary indexes.
- The scale is growing (real big data scenarios), but Bigtable databases can be used for infrequently accessed data because their data is stored in a distributed file system and can be easily expanded.
- To integrate with other services, check whether the database provides a write-after-synchronization function to capture database changes, notify other systems, and ensure consistency.
- Fault tolerance: Check whether the write operation is successful in the case of power failure, partition failure, and other faults.
- If you want to promote technological innovation in a certain direction, there seems to be no ready-made things to achieve this purpose. you have to create a new one on your own. This is not easy.
- CouchDB/Mobile couchbase can be used on Mobile platforms.
Which is better?
- It is not worthwhile to migrate to NoSQL for 25% performance improvement.
- Performance test data has specific scenarios and may not be suitable for you.
- If your company has just been established and there is no product yet, and you are willing to try something new, so selecting SQL or NoSQL requires you to spend some time (in other words, a blank piece of white paper is good for painting, and you can do whatever you want without having to do with the burden of the system ?).
- When the data volume is small, the performance gap is not obvious, but when the data volume increases?
- There is no perfect thing. if you go to Amazon's forum and see that it is filled with complaints about the performance and service of various products, the same is true for GAE. Every product has a problem. can you solve the problem of the product you choose?