In the past few years, relational databases have been the only choice for data persistence. Data workers only consider filtering these traditional databases, such as SQL Server, Oracle, or MySQL. You can even make some default options. For example, if you use. net, you generally select SQL server. If you use Java, you may prefer Oracle, Ruby is MySQL, and python is PostgreSQL or MySQL.
The reason is simple: the robustness of relational databases has been proven in most applications for a long time. We can use these traditional databases to control concurrent operations and transactions. However, if traditional relational databases are always so reliable, what else will nosql do? Nosql survive and develops because it does what traditional relational databases cannot do!
Problems in relational databases
Impedance mismatch
We use python, Ruby, Java,. net, and other languages to write applications. These languages share a common feature-object-oriented. However, we use MySQL, PostgreSQL, Oracle, and SQL server. These databases share a common feature-relational databases. The term "impedance mismatch" is involved here: the storage structure is object-oriented, but the database is relational. Therefore, we need to convert each time we store or query data. Orm frameworks like Hibernate and Entity Framework can simplify this process, but these ORM frameworks are stretched when there is a high-performance requirement for queries.
Application scale grows
As network applications grow, we need to store more data, serve more users, and require more computing power. To cope with this situation, we need to constantly expand. There are two types of Scaling: vertical scaling, that is, purchasing better machines, more disks, more memory, and so on; horizontal scaling, buy more machines to form a cluster. Under a large scale, vertical scaling does not play a very large role. First, the performance improvement of a single machine requires a huge amount of overhead and has a performance limit. Under the scale of Google and Facebook, it is never possible to use one machine to support all the loads. In view of this situation, we need a new database because relational databases cannot run well on clusters. You may also build a relational database cluster, but they use shared storage, which is not the type we want. As a result, there is a nosql era that uses Google, Facebook, and Amazon to handle more data transmission.
Nosql Era
There are already many nosql databases, such as MongoDB, redis, Riak, hbase, and Cassandra. Each has one of the following features:
- No longer use the SQL language. For example, MongoDB and Cassandra have their own query languages.
- Usually open-source projects
- Created for cluster running
- Weak structuring-data structure types are not strictly restricted
Nosql Database Type
Nosql can be divided into four types:Key-value, document-oriented, column-family databases, and graph-oriented databases.The following describes the features of these types:
I. Key-value Database
Key-value databases are like hash tables used in traditional languages. You can use keys to add, query, or delete data. Because primary key access is used, the performance and scalability are good.
Products: Riak, redis, memcached, Amazon's dynamo, project Voldemort
Who is using:GitHub (Riak), bestbuy (Riak), Twitter (redis and memcached), stackoverflow (redis), Instagram (redis), YouTube (memcached), Wikipedia (memcached)
Applicable scenarios
Stores user information, such as sessions, configuration files, parameters, and shopping carts. This information is generally linked to the ID (key). In this case, the key-value database is a good choice.
Unsuitable scenarios
1.Instead of querying by key, you can query by value. The key-value database does not use the value query method at all.
2.The relationship between data to be stored. In the key-value database, two or more keys cannot be used to associate data.
3.Transaction support. Rollback cannot be performed when a fault occurs in the key-value database.
Ii. Document-oriented database
Document-oriented databases store data as documents. Each document is a self-contained data unit and a collection of data items. Each data item has a name and a corresponding value. The value can be either a simple data type, such as a string, number, or date, or a complex type, such as an ordered list and associated objects. The minimum unit of data storage is document. The document attributes stored in the same table can be different. data can be stored in XML, JSON, jsonb, and other forms.
Product:MongoDB, couchdb, ravendb
Who is using:SAP (MongoDB), codecademy (MongoDB), Foursquare (MongoDB), and NBC News (ravendb)
Applicable scenarios
1.Logs. In an enterprise environment, each application has different log information. The document-oriented database does not have a fixed mode, so we can use it to store different information.
2.Analysis. Given its weak pattern structure, you can store different measurement methods and add new measurements without changing the pattern.
Unsuitable scenarios
Add transactions on different documents. The document-oriented database does not support inter-document transactions. If you have requirements for this, you should not choose this solution.
Iii. Column store (wide column store/column-family) Database
The column storage database stores data in the column family. A column family stores data that is frequently queried together. For example, if we have a person class, we usually query their names and ages together rather than their salaries. In this case, the name and age are put into one columnfamily, while the salary is in another columnfamily.
Product:Cassandra, hbase
Who is using:EBay (Cassandra), Instagram (Cassandra), NASA (Cassandra), Twitter (Cassandra and hbase), Facebook (hbase), Yahoo! (Hbase)
Applicable scenarios
1.Logs. Because we can store data in different columns, each application can write information to its own column family.
2.Blog platform. Each information is stored in different columns. For example, a tag can be stored in one, a category can be stored in one, and an article can be stored in another.
Unsuitable scenarios
1.If acid transactions are required. Vassandra does not support transactions.
2.Prototype design. If we analyze Cassandra's data structure, we will find that the structure is based on the expected data query method. At the beginning of model design, it was impossible to predict the query method. Once the query method changed, we had to redesign the column family.
4. Graph-oriented database
Graph database allows us to store data in graphs. Objects are used as vertices, while relations between entities are used as edges. For example, if we have three entities: Steve Jobs, Apple, and next, there will be two "founded by" sides connecting apple and next to Steve Jobs.
Product:Neo4j, infinite graph, orientdb
Who is using:Adobe (neo4j), Cisco (neo4j), T-Mobile (neo4j)
Applicable scenarios
1.In some highly correlated data
2.Recommendation engine. If we present the data in the form of graphs, it will be very beneficial for recommendation formulation.
Unsuitable scenarios
Unsuitable data model. Graph databases have a small scope of application, because few operations involve the entire graph.
Original article:
Nosql databases, why we shoshould use, and which one we shoshould choose (compilation/ZhongHao review/Zhou Xiaolu)