I. Relational databases are facing the pressure of data access. Generally, the solution steps take MySQL as an example)
1. Master-slave replication for read/write splitting or distributed read; 2. There are many read requests. You can add cache servers, such as Memcached, to improve read performance; however, you must manually maintain data consistency. 3. For scenarios with many write requests, you can simply scale up and use a server with higher performance to handle more write requests. At the same time, to ensure that the slave server keeps up with the update speed of the master server, the slave server may need to use the same configuration as the master server. This method is not cost-effective. 4. When the data access pressure increases further, the join query performance will drop sharply. In this case, we have to design an anti-pattern to merge tables based on business needs to increase data redundancy in exchange for system performance; 5. Disable Code such as stored procedures, stored functions, or triggers, and complete corresponding functions in the application. 6. Delete secondary indexes of a table, rewrite the query so that it only uses the primary key index; 7. Database sharding; this method is complex and costly to maintain; and it is costly to re-split when the data size increases again, limited secondary scalability;
Ii. RDBMS and NoSQLIn actual use, as long as the architecture is proper, relational databases can fully serve various levels of data storage applications, for example, Facebook and Google each have a well-running MySQL server cluster to serve data storage scenarios in different levels and different fields. However, applications of this scale require strong technical strength to break through various application restrictions, which will also lead to high maintenance costs, in addition, some endogenous limitations of relational databases will still become a nightmare in applications. As a result, in recent years, some new projects or frameworks classified as NoSQL have sprung up in multiple organizations or enterprises. These new projects or frameworks rarely provide query languages similar to SQL languages, but provide a simplified data access interface for APIs. However, the real difference between RDBMS and NoSQL lies in the low-level storage level, because NoSQL generally does not support transactions or secondary indexing functions. On the other hand, many features of NoSQL's famous projects overlap with each other, and even many features share the same with those of traditional relational databases. Therefore, NoSQL is not a revolutionary technology, although it is absolutely revolutionary from the perspective of engineers. As a result, in reality, memcached has also been put into the NoSQL camp. It seems that non-RDBMS storage management programs naturally belong to NoSQL, noSQL has thus become a "Haina baichuan" place for non-RDBMS systems. However, it is inevitable that there will be a mixture of good storage and good storage. In order to facilitate understanding, we will simply classify the mainstream NoSQL technologies from multiple dimensions, so that we can have a general understanding of them, in addition, there is a selection standard that can be referenced in actual application scenarios. 1. Data Model refers to the data storage method, which has several schools, such as relations, key values, columns, documents and images. In their respective implementations, relational databases include MySQL, PostgreSQL, and Oracle, and key-value databases include memcached, membase, Riak, and Redis, columnar databases include HBase, Cassandra, and Hypertable, document databases include MongoDB and CouchDB, and image databases include Neo4J. When selecting a specific NoSQL product, you should first evaluate how the application accesses data and whether the schema of the Data evolves frequently. 2. The storage model determines whether data storage is based on memory storage or persistent storage. 3. At what level does the consistency model storage system achieve data consistency, strict consistency, or result consistency? The level of consistency may have a huge impact on data access latency. 4. The physical model can be classified into distributed storage and standalone storage. For distributed storage, its scalability and scalability are also an important indicator. 5. read/write performance is significantly different for applications that work in different application scenarios. Different NoSQL products have different applicability. 6. Secondary indexes help to sort and query non-primary key fields. Some NoSQL products do not provide such functions. 7. fault handling different application scenarios have different fault recovery time tolerance, while different NoSQL products also have different performance in fault recovery capabilities. 8. Data Compression: When TB-level data is stored, especially text data, data compression can save a lot of storage space. 9. Server Load balancer distributed storage distributes users' read/write requests across multiple nodes at the same time, which greatly improves system performance. 10. locks, waits, and deadlocks the transaction processing process of RDBMS is divided into two stages. In scenarios with concurrent access by multiple users, this will significantly increase the waiting time for users to access resources, it may even cause deadlocks.
Iii. Data Consistency ModelIn summary, Data Consistency refers to the performance of data validity, availability, accuracy, and integrity during application access, it is used to ensure that the data displayed by each user is consistent during the execution of the user's own transactions or other users' transactions. Data Consistency may occur in various scenarios. However, most of the mentioned problems often involve application consistency, transaction consistency, and time point consistency. In the database, each operation may prompt the database to switch from one State to another, but the implementation method or process of this conversion is not specific, so it has a variety of different models. However, no matter which implementation is based, the final result is either converted to a state or restored to the original state to ensure data consistency. According to the degree to which the database ensures data consistency, there are roughly the following types: strict data changes are atomic and will take effect immediately; this is the highest level of consistency implementation; Sequential (Sequential) where each client sees data changes in the order they submit the application; Causal consistency (Causal) for all changes associated with the Causal, obtained in the same order on all clients. When no update occurs in the result consistency (Eventual) for a period of time, all updates will be transmitted through the system, and eventually all copies will be consistent; that is, when the transaction is completed, all data must be in the same state. Weak Consistency (Weak) does not guarantee that all updates can be advertised to all clients, it is not guaranteed that all clients can obtain data updates in the same order. The result consistency can be further subdivided into multiple sub-categories, and some sub-classes can coexist, the current CTO Werner Vogels of Amazon elaborated on this in "Eventually Consistent. In this article, he also proposed the CAP theorem and pointed out that a distributed system can only achieve consistency, availability, and partition tolerance at the same time.
This article is from the "Marco Education" blog. For more information, contact the author!