With the popularity of NoSQL, it is necessary to understand this new type of database.
First, why should we choose NoSQL?
There are two main reasons: first, the amount of data to be processed, or the efficiency of data access is very high, so the data must be placed on the cluster, and the second is to adopt a more convenient way of data interaction to improve the efficiency of application development
The biggest problem of traditional relational database is the impedance detuning.
Second, what are the common features of NoSQL databases?
Do not use relational models; run well in clusters; open source; applicable to internet companies in the 21st century; no mode
Third, the NoSQL data model:
The model can be divided into four categories: ' key value ' document ' column family ' graph ' the first three kinds of data models have a common feature, that is, "for aggregation."
For aggregation, the cells used to manipulate the data are much more complex than the set of tuples, which can be used to hold lists or to nest other record structures in such a complex structure.
Specifically,
1) The boundary between the key-value data model and the document Data Model is blurred. The difference is as follows: The aggregation of the key-value database is opaque. Contains only chunks of information that do not have much meaning; in the aggregation of the document database, you can see its structure. The advantage of opacity is that any data can be stored in the aggregation, in addition to the limit size, other aspects are very casual. The document database restricts what is stored in it, defines the allowable structure and data types, and provides better access to the data
Key-value databases are basically searching through keys for aggregated content, and in a document database, the query keywords that are submitted are often based on the internal structure of the document
2) column family storage
Use a large tabular data model. The best way to understand this is to think of it as a two-level aggregation structure. The first key value represents the row identifier, which you can use to get the aggregation you want. The difference between a column family structure and a "key-value store" is that the "row aggregation" itself is a map that contains more detailed values. These level two values are called columns.
Three different ways to compare:
Common denominator : using the concept of aggregation, there is an index key in the aggregation that can find its contents. When running on a cluster, aggregation is central, because the database must ensure that the data within the aggregation is stored on the same node.
different points : The key-value data model aggregates the aggregation as an opaque whole, and can only isolate the entire aggregation based on the key, rather than just querying or getting part of it; the aggregation of the document model is transparent to the database, so that only a subset of the data can be queried and retrieved, but because the document has no schema, When you want to optimize storage and get some of the content in aggregations, the database is not well tuned for the document structure; The column family model divides aggregations into column families, which the database treats as a unit of data within a row aggregation. There is some limitation to the structure of such aggregations, but the database can take advantage of this structure to improve its accessibility.
3) Graph Database
Attach importance to the "relationship" between data, the graph is a graph data structure, which contains the edge of the connection node. Traverse very quickly
IV: Distributed model
Distributed models are based on scale-out across server clusters. The cells that are aggregated into the data distribution. The data distribution has two paths, duplicates and shards, which are two orthogonal techniques that can be either selected or used. Next is a few distributed-related concepts
1) sharding
Different users need to access different parts of the data set, so we put the parts of the data in different servers, in order to achieve scale-out, this technology is called sharding
2) master-slave replication
Copying data to multiple nodes, one of which is called a "master node" or "Primary node", holds authoritative data and is typically responsible for processing data update operations. The remaining nodes are called slave nodes or "secondary nodes", and the copy operation is to synchronize the slave node with the master node.
In cases where data sets are frequently read, master-slave replication helps improve the performance of data access, enhancing the ability to recover from read operations, but the data is lost once the primary node is faulted
3) Peer Copy
Without the concept of a master node, all replicas have the same status, can accept write requests, and lose one copy without affecting access to the entire database
NoSQL Introduction (NoSQL distilled reading notes)