General classification of NoSQL database data models:
1. Key-Value data model
2. Document Data Model
3. Column Family Data Model
4. Figure Data Model
Common NoSQL databases:
Redis, Cassandra, MongoDB, neo4j, Riak ...
Database application Trends:
1. Due to the increasing volume of data, the scale-up of large systems is scaled by the scale-up of databases on a single computer in a computer cluster
2. Hybrid persistence (relational database + NoSQL database)
The first part
The 1th chapter why using NoSQL
* Impedance mismatch between the relational database and the application. This condition is caused by inconsistencies in the memory data structures in the relational model and application programs, so many object-relational mapping (ORM) frameworks emerge, but misuse of ORM can lead to performance problems.
* Integrated database and application database, which is a way for multiple applications to share the same database, characterized by the fact that all applications have access to what they want, but because they are compatible with all applications, the design of the database can be overly complex and lead to poor maintenance. There is also a need to address the "impedance mismatch" between the system database and the application. The latter derives in addition to Web services, where applications interact in an interface, typically using XML or JSON to exchange data, and you can use structured data formats (arrays, nested structures, and so on).
* Web services often use the HTTP protocol that transmits text information, and for some performance requirements, the binary protocol can be used.
* A relational database is incompatible with a computer cluster. The relational database can divide the data into several sets, which are deployed on separate servers, which can effectively fragment the database. But there are drawbacks to this, and the application must control all shards, knowing where every small piece of data in the database is stored. Also, queries, referential integrity, transactions, consistency control, and so on, can no longer be executed in a cross-shard environment.
* The use of relational databases in a cluster can be a cost problem (commercial relational databases are typically billed as servers).
* A typical computer cluster user, led by Google and Amazon, has developed BigTable and Amazon is Dynamo.
* Hybrid persistence requires the integration database to be migrated to the application database, in general, the NoSQL database can be used in the application database, but the integrated database is not suitable for NoSQL databases. Organizations can consider moving data that was originally stored in an integrated database to individual application databases, and then use Web services to connect to those systems. This reduces the coupling between applications and reduces the cost and maintenance complexity.
* The two main reasons for choosing a NoSQL database are:
1. The amount of data to be processed is very high, and the efficiency of data access requires a lot of data to be placed on the cluster.
2. It is desirable to use a more convenient method of data interaction to improve the efficiency of application development.
* Features of NoSQL:
1. Do not use the relational model
2. Running well in the cluster
3. Open source
4. Internet companies that apply to 21st century
5. No mode
The 2nd Chapter aggregation data Model
* The aggregation structure can embed the associated data directly into an aggregation, making it easy to find. Aggregation makes it easier for a database to manage data storage on a cluster.
* Aggregation structures that are useful for interacting with some data may hinder other data interactions. When modeling data, you need to consider which scenario dominates, and how you can design the data aggregation.
* Choosing a critical factor for the aggregation model is that it is well suited to run on a cluster. We need to minimize the number of nodes required to collect data. If the aggregation structure is explicitly included in the database, then the database knows that the data will be manipulated together and placed in the same node.
* Typically, an aggregation-oriented database does not support acid transactions that span multiple aggregates. Aggregation is a collection of data that acts as an interaction unit, and the acid operations in the database are aggregated as bounded.
* The key-value database, the document database, and the column family database belong to the aggregation-oriented database.
* If data interactions are mostly performed within the same aggregation, you should use an aggregation-oriented database, and if data interaction requires data in a variety of different formats, it is best to choose aggregate ignorant databases.
The 3rd chapter of data Model detailed
Graph database
* Common graph databases are: FLOCKDB, neo4j, Infinite graph.
* Graph database is ideal for dealing with complex relationships, such as social networks, product preferences, eligibility rules and other data with complex relationships.
* For relational databases, data models with close inter-relational relationships often require many join statements and are less efficient. Using the graph database to traverse relationships is very fast. The main reason is that the graph database spends more time inserting relational data to shorten the time it takes to traverse relationships. This tradeoff is important in cases where queries are more efficient than insert efficiency.
* The graph database uses edges and nodes to store data, and when working with graph databases, most of the time you are browsing through relationships.
* The obvious difference between the graph database and the aggregation-oriented database is that the graph database attaches importance to the "relationship" between the data, and the differences in the data model also lead to some other differences. The graph database typically runs on a single server rather than in a cluster. To keep the data consistent, the acid transaction needs to cover multiple nodes and edges. The similarity is that they do not apply to relational models.
Non-modal database
* All forms of NoSQL have one feature, which is that they are all modeless. In a relational database, we first define the pattern, that is, to use a predefined structure to explain to the database, which tables, what columns are in the table, what type of data each column holds, and the schema must be defined before the database can be used. In contrast, a modeless database is quite casual and can hold arbitrary data under a single key.
* The modeless database does not have as many restrictions as a relational database, but it also has some problems, regardless of the degree to which the database is modeless, which is a series of assumptions about the data structure as it is written. While we can store data in a modeless database with a valid name key, this assumes that the program needs to know these fields, unless we simply go through the data structure and print the key-value pairs in turn, but this is basically meaningless.
* The implicit pattern in the application code also poses some problems, meaning that to understand the data in the database, it is necessary to delve into the code of the application, and if the code is clear and understandable, then you can infer the pattern of the data based on it, but this is not guaranteed, because it depends entirely on your code clarity. As a simple example, a field, which is saved as a string in some aggregations, is saved as an object in some other aggregations (because the database is modeless, this is entirely possible in theory), and if you don't know the logic of the code, you simply cannot tell what the difference is between the two data types of the field. And, because the database is not aware of patterns, it is not possible to verify data on their own, which is less a guarantee. Different applications may also use a field in a completely different way. Of course, in validating data, we must also add exactly the same validation logic to all applications that use the database to ensure data correctness and security.
* In essence, modeless databases are handled by application code that accesses their data, and problems can occur if multiple applications require access to the same database. One way to mitigate this problem is to encapsulate the data interaction action as a standalone application and integrate it with other applications in the form of a Web service. Another approach is to explicitly divide the regions of different applications in the aggregation.
Materialized view
* Relational databases can use views to show some data that requires complex computations to be given, and the view is like a relational table, except that it is calculated from the base table, and when the view is accessed, the database calculates the data in the view, which is convenient to encapsulate. With the view mechanism, the client does not have to worry about whether it accesses basic or derived data, but the resulting view requires a lot of computation to compare consumption performance.
* Materialized views are pre-computed and cached views that can be optimized for performance using materialized views if the data is read frequently and the real-time requirements are not high. In NoSQL, you can use materialized views to handle this requirement. The aggregation-oriented database emphasizes this issue because most applications deal with some sort of query operation that does not match the aggregation mechanism.
* There are two ways to build materialized views, the first one is to update materialized views every time the underlying data changes, and if the materialized views are read much more often than written, and you want to get more real-time data, this method is more appropriate. The second is to periodically update materialized views through batch processing, which requires a solution based on business requirements and examines how long the business can tolerate data that is obsolete.
* Aggregation-oriented databases are often able to reorganize the data of the master aggregation in different ways to calculate a variety of materialized views, which are generally used by the Map-reduce method.
Building a data storage model
* When modeling with a column family database, it should be handled according to query requirements rather than write requirements. The general rule of modeling is to make it easy to query, and to perform "anti-normalization" operations when writing data.
The 4th Chapter distributed model
* The aggregation-oriented database is well suited for scale-out, because aggregations are naturally the data distribution units at this point.
* There are two ways to distribute data:
1. Copying
2. sharding
* Replication is the copying of the same data to multiple nodes. Sharding is the storage of different data in different nodes. Replication and sharding are two orthogonal techniques that can be either used or used together.
Single Server
* In most cases, the simplest form of distribution is no distribution, in the form of a single server, characterized by simplicity.
Sharding
* Sharding is a scale-out by placing various parts of the data in different servers (nodes). Partitioning technology can effectively achieve load balancing. To maximize the use of sharding technology, we must ensure that the data that needs to be accessed at the same time is stored on the same node, and that the nodes must be well-arranged to make the access speed optimal. These include enabling servers to serve geographically close users, maximizing load balancing, and, in some cases, aggregating aggregations that need to be read sequentially.
* Often people use the application to process shards, such as the first letter of the user's surname according to A-d,e-g, such as in different shards, but this will complicate the programming model, because the application needs to spread the query operations on multiple shards. If you want to adjust the Shard, you need to modify the application's related logic and migrate the data. Many NoSQL databases provide "auto-sharding" to make the database responsible for distributing the data to each shard and routing the query request to the appropriate shard.
* Single use of sharding technology in response to database failure is not enough, once a node crashes, the data on the Shard is inaccessible, although it may be only one shard problem, but the comparison has been unable to provide a complete service. You'll see later that the technique of partitioning with replication constructs redundancy to mitigate this problem.
Master-slave replication
* In master-slave replication, there is a node called node, the rest is called from the node. The master node holds authoritative data, and the copy operation is to keep the data from the node and the master node in sync.
* The Master node is responsible for processing the data read and write, from the node only responsible for data reading. This allows for scale-out as long as the node is increased.
* Can be used to manually specify the master node and automatically select the main node two modes, manually specify that we need to configure the cluster, automatic selection is relatively simple, will let them automatically elect the master node, if the master node crashes, will automatically assign a new master node, reduce the failure time, manually specified there is no such benefit.
* Master-slave replication helps enhance the failure recovery capability of read operations.
* Master-slave replication for very frequent write operations, although the read operation can be diverted to slightly alleviate the primary node to write data pressure, but there is no significant improvement in the performance of the write operation.
* Replication technology brings benefits as well as a fatal weakness, which is inconsistent data. If the master node updates a data, but has not yet notified the slave node, then the data that the user reads from the node is not up-to-date, even as the speed of synchronization of each slave node is different. More extreme, the primary node crashes, where some data is not synchronized to the slave node, and this data is lost.
* The primary node is still the bottleneck and weakness of the system.
Peer copy
* All nodes in the peer copy are equal, there is no master-slave node one says that all nodes can accept read and write requests, and the node notifies all other nodes of its own write operations.
* Adding nodes can easily improve performance.
* Peer replication also has a data consistency issue, because two different nodes can simultaneously handle the write operation, so there may be two users simultaneously on the same record, which results in a "write conflict." The inconsistency of data reads also exists, but the duration is relatively short, because eventually it will be consistent, but the write inconsistency always exists.
* Several solution ideas for writing inconsistencies:
1. Whenever data is written, the replicas always coordinate with each other to ensure that there is no conflict, which requires a certain amount of network traffic to coordinate the write operation.
2. Try to handle these inconsistent write operations, such as merging these operations.
Combining Sharding and Replication technology
* Each system has multiple master nodes, but for each data, there is only one master node responsible for it. Depending on the configuration, the same node can be either the primary node for some data or a slave node for other data, or it can be assigned a full-time master node and a slave node.
The 5th Chapter Consistency
* The cluster-oriented NoSQL database needs to face the problem of consistency, and the relational database tries to avoid various inconsistencies by "strong consistency", and in the NoSQL domain, we need to think about the consistency of the system.
Update consistency
* Multiple requests to update the same piece of data at the same time can create a write conflict problem, there are two ways to maintain data consistency in a concurrency environment: pessimistic and optimistic.
1. A pessimistic approach is to avoid conflict. A common practice is to use write locks to ensure that only one request at a time can acquire the lock. However, attention can cause deadlock problems.
2. The optimistic approach is to allow the conflict to occur, then detect the conflict and sort the conflicting actions before further processing. Typically, conditional update is used to detect whether the current value of the data is the same as the last read before any user performs the update operation. If the update is the same, it will fail if it is different, it needs to be updated to the latest in the update. See how a distributed version management system like git handles conflicts.
* Pessimistic and optimistic way, there is a prerequisite that the update sequence must be consistent, which is clearly established in a single-machine environment, but in a distributed cluster, two nodes may be executed in different order, the final data will be inconsistent.
* In the concurrency of distributed systems, there is a need for "sequential consistency", that is, all nodes are guaranteed to perform operations in the same order.
Read consistency
* Database must have update consistency, but this is not enough, it does not necessarily guarantee that the user submits the access request always can receive the consistent response. A typical scenario is a user A's update action that requires the sequential operation of two tables of data, and if it is an aggregation-oriented database, the ACID transaction is not available here, so it is updated sequentially. If User B accesses that data from the second table when the first table has been updated and the second table has not been updated, you will see an inconsistent piece of data. This consistency is called "logical consistency."
* Not all NoSQL databases do not support acid transactions, and only the aggregation-oriented database is not supported, and the graph database supports acid transactions.
* For aggregation databases you can "atomically" update an aggregated data, but only within a single aggregation. So, "logical consistency" can be maintained within an aggregation, but not between aggregations. There is a time slot for updating data between multiple aggregations, and the associated data that is read in this gap does not meet the "logical consistency", which is called "Inconsistent windows". The "inconsistent window" of a nosql system is typically short-lived.
* In the introduction of replication scenarios, you will encounter a new inconsistency problem, called "Replication inconsistency." If there are multiple nodes, the data is updated on a node, and some users have access to the expired data before all the nodes have synchronized the data. Eventually, however, the update is propagated to all nodes, which is called "eventual consistency."
* "Replication Consistency" and "logical consistency" are two separate issues, but if the "inconsistent window" in the "copy" process is too long, it can exacerbate the "logical inconsistency" issue. The "inconsistent window" left in the master node is only a few milliseconds long after two short intervals and different content updates, but due to network latency, this "inconsistent window" is much longer on the slave node than on the primary node.
* "Session Consistency" is the "read-as-you-write consistency" within a user's session. One way to ensure "session consistency" is to use sticky sessions, which is to bind a session to a fixed node, but the disadvantage is that it reduces the efficiency of the load balancer. Another approach is to use the "version stamp", which will be detailed later.
Easing the "consistency" constraint
* Consistency is important, but sometimes it must be discarded. When designing systems, we often need to sacrifice consistency in exchange for other features.
* Relational databases generally use transactions to enforce consistency, but transactions can affect system performance.
* Cap theorem, where cap means
1. Consistency (consistency), the specific meaning mentioned before.
2. Availability (availability), here can refer to the efficiency of the response, or delay.
3. Partition tolerance (Partition tolerance), if a communication failure occurs, resulting in the entire cluster being split into multiple unreachable partitions (also called brain splits), the cluster is still available.
The cap theorem describes the three properties of a given consistency, availability, and partition tolerance, and we can only satisfy two of those properties at the same time.
Relaxing "persistence" constraints
* Some data can be persisted or deferred, such as a user session or some temporary data can be saved in Redis, the generation and update of very frequent but not so important data can delay persistence, such as timed persistent write.
* Whether to relax the "persistence" constraint needs to be determined according to the specific needs.
Arbitration
* Assuming that a certain piece of data needs to be replicated to 3 nodes, in order to ensure "strong consistency", it is not necessary for all nodes to confirm the write operation, only two nodes (more than half) are confirmed. That is, if there are two conflicting write operations, only one of the operations can be recognized by more than half of the nodes (W>N/2). This is called writing to quorum.
* Read arbitration, refers to the need to read to ensure that the most up-to-date data, must contact the number of nodes before the line.
* When performing data operations in a distributed model with "Replication" technology, there is no need to contact all replicas, as long as enough copies are recognized to maintain "strong consistency".
* An inequality can be expressed between the number of nodes (R) required to perform a read operation, the number of nodes (W) to be contacted when confirming a write operation, and the replication Factor (N): R+w>n.
6th Chapter Version Stamp
* Version stamp user identity data version, typically using a counter version stamp, if the current database is a version of a data stamp is 3, and the user request updated data version stamp is 2, indicating that the user last read to the update, the data has been updated, it may be that someone else updated the data at this time, So this user's update will fail.
* Version stamps can typically be expressed using counters, GUIDs, content hashes, timestamp of last update, and each of these schemes has pros and cons. It can also be used together, such as the COUCHDB version stamp, which uses the content hash and counter.
* In addition to avoiding "update conflicts", the version stamp also helps maintain "session consistency".
* In a distributed environment, an array of version stamps can be used to detect "conflicting update operations" between different nodes.
7th Chapter Map-reduce (Mapping-simplification)
* Mapping-simplification is a pattern used to perform concurrent computations on a cluster.
* The map operation reads the data from the aggregation and reduces it to a pair of phase key values. The map operation can read only one record at a time, so it can be executed concurrently on the node that holds the record.
* The mapping task generates a number of values with the same keyword, while the simplification task will simplify them to a single output value. Each degenerate function only operates on the mapping results associated with a single key, so multiple simplification functions can perform concurrency simplification based on the keyword.
* multiple "degenerate functions" with the same input data as the output data can be merged into "pipelines" to improve concurrency and reduce the amount of data that needs to be transferred.
* If the result of a "degenerate output" is the input of the next "map operation", you can use the "pipe" combination map-simplify operation.
* If you need to extensively use the results of mapping-simplification, you can store it as a materialized view.
* The materialized view can be updated using the incremental mapping-simplification operation, which only needs to calculate the part of the data that has changed in the view, without having to calculate all the data from scratch.
8th Chapter Key Value Database
Comparison of key-value databases and relational databases
Oracle |
Raik |
DB instance |
Raik Cluster |
Table |
Storage area |
Yes |
Key-value pairs |
rowID |
Key |
What is a "key value database"
* From the API point of view, the key-value database is the simplest NoSQL database. The client can query the value according to the key and set the value corresponding to the key.
* The data stored in a key database such as Redis is not necessarily a domain object. Redis can store data structures such as list, set, hash, and can also be used to find the difference set, the set, the intersection, and get the values in a range.
Key-Value Database properties
* Consistency, only the action for a single key is consistent, typically set, get, or Del.
* Transactions, different key-value databases with different transaction specifications, there is generally no guarantee of write consistency. For example, Raik uses arbitration.
* Query function, all key-value databases can be queried by keyword, but cannot be queried based on column values.
* Data structure, key value database generally do not care about values, values can be binary, text, JSON and so on.
* Scalability, many key-value databases can use sharding technology. With this technique, the name of the key determines the node that is responsible for storing the key. A database such as Raik can control the parameters in the "CAP" theorem, N (number of replica nodes that hold key-value pairs), R (the minimum number of nodes required to successfully complete a read operation), and W (the minimum number of nodes required to successfully complete a write operation).
Applicable cases
* Store session Information
* User Configuration information
* Shopping Cart Data
Non-applicable occasions
* Relationship between data
* Transactions that contain multiple operations
* Query data
* Operation Keyword Collection
9th Document Database
Comparison of document database and relational database
Oracle |
Mongodb |
DB instance |
MongoDB instance |
Mode |
Database |
Table |
Collection |
Yes |
Document |
rowID |
_id |
Join |
Dbref |
Characteristics
* Consistency, in order to ensure consistency in the MongoDB database, you can configure the replica set, or you can specify that the write operation must wait for the written data to be copied to all or a given number of slave nodes before returning. Improved consistency can reduce write efficiency. Can be configured to increase the read efficiency of the replica set.
* transactions, which support only single-document-level transactions, i.e. atomic transactions.
* Usability, the document database attempts to use the master-slave data replication technology to enhance usability. Multiple nodes maintain the same data, even if the primary node fails, the client can still get the data, the application generally does not need to detect whether the primary node is available. All requests are processed by the primary node, and their data is copied to the slave node. If the primary node fails, the remaining nodes in the replica set elect a new master node within its own scope. Replica sets are typically used to handle data redundancy, automatic failover, disaster recovery, and so on.
* Query function, the document database can query the data in the document, rather than the key-value database, must be based on the keyword to obtain the entire document, and then view its contents. MongoDB can also be queried based on "inline sub-documents".
* Scalability, add more "read from node", all read operations to boot to the slave node, so that the database can be extended to cope with the ability to read frequently. If you want to extend the write capability, you can shard the data, and the Shard operation divides the data according to a specific field (the choice of the field is important), and the data is moved to different nodes. In order for each shard load to be balanced, it is necessary to dynamically transfer data between nodes, add more nodes to the cluster, and increase the number of writable nodes to scale the write capability horizontally. Each shard is made into a replica set to improve read efficiency.
Applicable cases
* Event Log
* Content Management system and blog platform
* Website analysis and real-time analysis
* E-commerce applications
Non-applicable occasions
* Complex transactions with multiple operations
* Query for continuously changing aggregation structures
10th Chapter Family Database
relational database |
Cassandra |
DB instance |
Cluster |
Database |
Key space |
Table |
Column Family |
Yes |
Yes |
Column |
Column |
Characteristics
* Consistency, Cassandra when a write request is received, the data to be written is logged to the commit log and then written to a structure called memory table in memory. The write operation succeeds even after writing to the "commit log" and "Memory Table". Write requests are piled up in memory and periodically written to a structure called "sstable", where the cache is written to the database and no further writes are made to it. If the data changes, you need to write a new sstable. Useless sstable can be recycled by a "compress" operation.
* Transaction, Cassandra does not have a traditional sense of transaction, its write operations are atomic at the row level.
* Availability, Cassandra is highly available, because there is no master node in the cluster, which reduces the consistency of the operation request to indicate the availability of the cluster. Availability is subject to the R+w>n formula. W is the minimum number of nodes required for a successful write operation, and R is the minimum number of reply nodes required to perform a read operation successfully, and N is the number of nodes participating in the data replication.
* Query function, Cassandra No feature-rich query language. After the column family inserts data, the data in each row is sorted by column name. If a column gets more frequently than other columns, consider using its value as a row health to improve performance.
* Basic query with GET, set and Del.
* Advanced Query and index compilation, Cassandra column families can be indexed with columns other than keywords. These indexes appear in the form of a bit map, which works well in cases where duplicate values occur frequently in columns.
* Cassandra Query Language (CQL), provides query functionality, but does not contain all the functionality of SQL.
* Scalability, because there is no master node, so long as the new node in the Cassandra Cluster can improve its service capabilities.
Applicable cases
* Event Log
* Content Management system and blog platform
* Counter
* Limited Use
Non-applicable occasions
* systems that require write and read operations with an "ACID transaction".
* A scenario that aggregates data based on a query structure.
* Development of early stage, prototype period and technical preliminary period. At the beginning of the development of the query mode can not determine the changes, once the query mode changes, the design of the column family should be modified. Note that the cost of changing the data schema for a relational database is high, but the cost of modifying the query pattern is lower and Cassandra is the opposite.
The 11th Chapter diagram database
Characteristics
* Consistency, because the graph database operates on interconnected nodes, most graph databases generally do not support the distribution of nodes on different servers. The graph database guarantees consistency through transactions, which do not allow a "hanging relationship": All relationships must have a start node and a terminating node, and the relationship must be removed before the node can be deleted.
* Transaction, NEO4J is a database that is compatible with acid transactions, you must start a transaction before modifying a node or adding a relationship to an existing node.
* Usability, neo4j starting from 1.8, supports "Replica from node", which can handle write operations, write to it, it will first synchronize the written data to the current master node, then the master node is then synchronized to the other slave nodes. You can also match zookeeper to record the most recent transaction ID in each bundle node and the current master node.
* Query function, graph database can use Gremlin and other query language, NEO4J can also use cypher language to query the graph.
* Extensibility, it is difficult to use Shard technology in graph database, because it is not oriented to aggregation, but is oriented to relationship. Because any node can be associated with other nodes, so the related nodes on the same machine, traverse the diagram is more convenient, on multiple machines on the performance is not good. There are three ways to expand a diagram database:
1. Configure the server with sufficient memory so that it can fully accommodate all nodes and relationships in the working set. This technique is only useful if the number of these amounts of memory is more reasonable.
2. Increase the read capability of the data by adding only the slave nodes that can read the data, and all write operations are still the responsibility of the master node.
3. If the data set is too large, resulting in multi-node replication is not realistic, you can use "domain-specific knowledge" on the application side of the Shard, such as geographical location, such as fragmentation.
Applicable cases
* Interconnected Data
* Arranging transportation routes, dispatching goods and location-based services
* Recommended engine
Non-applicable occasions
* Where entities are updated in all or a subset
* Operations involving the entire map
12th Mode Migration
* To migrate "strong-mode" databases such as relational databases, you can save previous schema changes and their data migration operations in the versioning sequence.
* Due to the program code to access the data of the modeless database according to the "implicit mode", the data migration still needs to be handled with care.
* Non-modal databases can also be borrowed from strong-mode database migration techniques.
* The modeless database can use incremental migration technology to update the data to modify the implicit pattern of the data without affecting the application reading the data.
13th Chapter Hybrid Persistence
* Hybrid persistence is designed to handle a wide variety of data storage needs using different database technologies.
* Hybrid persistence can be used by multiple programs in an enterprise or in a single application.
* Encapsulating data Access as a service can reduce the impact of database changes on other parts of the system.
* New database technology makes programming and operations more complex, so weigh the pros and cons of bringing in new databases and introducing the complexities involved.
14th Chapter Beyond NoSQL
* File System
* Event Traceability
* Memory Image
* Version Control
* XML Database
* Object Data Volume
The 15th chapter selects the appropriate database
* Improve programmer productivity by using a database that is more compliant with your application needs.
* Improve data access performance with a combination of technologies that can handle big data volumes, reduce latency, and increase data throughput.
* Before deciding to use a NoSQL technology, be sure to test whether it improves programmer productivity and data access performance as expected.
* The use of services to encapsulate the database, which can change the requirements or technology after the maturity of the database technology package. The parts of the application can be divided into different services to introduce NoSQL databases for existing programs.
* Most applications, especially "non-strategic" applications, should continue to use relational database technology, at least when the NoSQL technology is not yet more mature.
"NoSQL Essence" Reading notes