In the past few years, relational databases have been the only choice for data persistence, and data workers are considering only filtering in these traditional databases, such as SQL Server, Oracle, or MySQL. Even make some default choices, such as using. NET will typically choose SQL Server, and Java may be biased toward Oracle,ruby, Mysql,python is PostgreSQL or MySQL, and so on.
The reason is simple: over a long period of time, the robustness of relational databases has been proven in most applications. We can use these traditional databases to control concurrency operations, transactions, and so on. But if the traditional relational database is so reliable, then what's the NoSQL? NoSQL's survival and development is because it has done the traditional relational database can not do!
Problems in relational database
Impedance mismatch
We use Python, Ruby, Java,. NET, and these languages have a common feature-object-oriented. But we use MySQL, PostgreSQL, Oracle, and SQL Server, which also have a common feature-relational database. This involves the term "impedance mismatch": The storage structure is object-oriented, but the database is relational, so we need to do the conversion every time we store or query the data. An ORM framework like the Hibernate and Entity framework does simplify this process, but these ORM frameworks are stretched when there is a high performance requirement for queries.
The size of the application becomes larger
As the scale of Web applications grows, we need to store more data, serve more users, and require more computing power. In order to deal with this situation, we need to continue to expand. Extensions fall into two categories: one is vertical expansion, that is, buying better machines, more disks, more memory, and so on; In a large scale, the vertical expansion of the role is not very large. First-machine performance improvements require huge overhead and a high performance limit, and it is never possible to use a single machine to support all of the load on a scale like Google and Facebook. Given this situation, we need a new database because the relational database does not run well on the cluster. Yes, you might also be able to build relational database clusters, but they're using shared storage, which is not the type we want. So there's the NoSQL era, led by Google, Facebook and Amazon, trying to handle more transmissions.
NoSQL era
Now there are many NoSQL databases, such as MongoDB, Redis, Riak, HBase, Cassandra, and so on. Each has one of the following features:
no longer uses SQL language, such as MongoDB, Cassandra has its own query language is usually the open source project for the cluster run weak structure--not strictly restricted data structure type
Type of NoSQL database
NoSQL can be roughly divided into 4 categories: Key-value, document-oriented, column-family Databases, and graph-oriented Databases. Here's a list of these types of features:
A key value (Key-value) database
A key-value database is like a hash table used in a traditional language. You can add, query, or delete data by key, and you get good performance and scalability, given the use of primary key access.
Products: Riak, Redis, Memcached, Amazon ' s Dynamo, Project Voldemort
Who are using: GitHub (Riak), retailer (Riak), Twitter (Redis and memcached), StackOverflow (Redis), Instagram (Redis), Youtube (Memcached), Wikipedia (Memcached)
Applicable scene
Store user information, such as sessions, profiles, parameters, shopping carts, and so on. This information is generally linked to the ID (key), which is a good choice for a database of key values.
Scenario Not applicable
1. Instead of through the key query, but through the value of the query. There is no way to pass a value query in the Key-value database.
2. The relationship between data needs to be stored. You cannot associate data with two or more keys in the Key-value database.
3. Support for the transaction. You cannot rollback when a failure occurs in the Key-value database.
Ii. Document-oriented (document-oriented) database
Document-oriented databases store data in the form of documents. Each document is a self-contained data unit, data items. Each data item has a name and a corresponding value, which can be either a simple data type, such as a string, a number, a date, or a complex type, such as a sequence table and an associated object. The smallest unit of data storage is a document, the document attributes stored in the same table can be different, and data can be stored in various forms, such as XML, JSON, or JSONB.
Products: MongoDB, CouchDB, RavenDB
Who is using: SAP (MongoDB), Codecademy (MongoDB), Foursquare (MongoDB), NBC News (RavenDB)
Applicable scene
1. Log. In an enterprise environment, each application has different log information. The document-oriented database does not have a fixed pattern, so we can use it to store different information.
2. Analysis. Given its weak mode structure, it is possible to store different metrics and add new metrics without changing the mode.
Scenario Not applicable
Add transactions to different documents. The document-oriented database does not support transactions between documents and should not be selected if there is a need for this.
Three, column storage (Wide column store/column-family) database
The column storage database stores the data in a column family, column accessibility, and a row family stores related data that is often queried together. For example, if we have a person class, we usually check their names and ages together rather than pay. In this case, the name and age are placed in one column family and the salary is in the other.
Products: Cassandra, HBase
Who is using: Ebay (Cassandra), Instagram (Cassandra), NASA (Cassandra), Twitter (Cassandra and HBase), Facebook (HBase), Yahoo! (HBase)
Applicable scene
1. Log. Because we can store data in separate columns, each application can write information to its own column family.
2. Blog platform. We store each message in a different group of columns. For example, a label can be stored in one, the category can be in one, and the article in another.
Scenario Not applicable
1. If we need acid business. Vassandra does not support transactions.
2. Prototype design. If we analyze the Cassandra data structure, we will find that the structure is based on the way we expect the data to be queried. At the beginning of the model design, it is impossible to predict its query mode, and once the query mode changes, we must redesign the column family.
Iv. figure (graph-oriented) database
The graph database allows us to store data in graphs. Entities are treated as vertices, and relationships between entities are treated as edges. For example, we have three entities, Steve Jobs, Apple and next, and there are two "founded by" sides that connect Apple and next to Steve Jobs.
Products: neo4j, Infinite Graph, Orientdb
Who is using: Adobe (neo4j), Cisco (neo4j), T (neo4j)
Applicable scene
1. In some highly relational data
2. Recommended engine. If we show the data in the form of a graph, it will be very useful for the recommended formulation
Scenario Not applicable
Unsuitable data model. The application scope of the graph database is very small, because few operations involve the whole diagram.
Original link: