Scale out: Scaling out, adding processing nodes to improve overall processing power
Scale up: vertically scaled to improve the overall processing power by increasing the processing power of a single node
Replication
MySQL's replication is asynchronous and is suitable for scenarios where data real-time requirements are not particularly critical. The IO thread on the slave side is responsible for reading the logs from Master, and the SQL thread is specifically responsible for applying the logs read from master on the slave side (early MySQL was implemented with a thread and performance issues were obvious). Using replication must enable binary log,mysql to distribute updates to slave with binary log
Replication Level
1. Row level:5.1.5 started to support. Mater records the change log for each row of data, and slave is applied on a per-line basis. Pros: Data consistency is more secure. Cons: May cause log files to be larger
2. Statement Level:master records each query statement executed and some contextual information, slave nodes are re-executed on slave based on that information. Advantage: Binary log is smaller. Cons: Data consistency is difficult to protect in some cases
3. Mixed Level:mysql Choose which replication method to select according to the situation. 5.1.8 start supporting
Common Architecture
1. Master-slaves: This is usually the way
2. Dual Master (master-master): 2 Master nodes synchronize updates with each other. Because of the asynchronous replication of MySQL, in order to prevent inconsistencies caused by data conflicts, generally only one is used for write operations, the other is not used or only for read operations. The purpose is that one master can use another master when the maintenance is down or the fault is interrupted.
3. Cascade Replication (Master-slaves-slaves): In master slaves, if slaves too much replication will increase the load on master, you can have master distribute the update log to only a few slave These slave distribute the update log to the subordinate nodes as a first-level node
Cluster
Implemented primarily through the NDB cluster storage engine, is a share nothing architecture that does not need to share any data between MySQL servers. Redundancy (reliability) and load balancing can be achieved. At first MySQL needs all the data and indexes to be loaded into memory to use cluster, high memory requirements, and currently through improvement only requires that the index can be loaded into memory
MySQL clusrer architecture, pictures from official MySQL document
1. SQL Nodes: Responsible for things outside the storage tier, such as connection management, query processing and optimization, cache management, and so on, the MySQL server that stripped the storage tier
2. Data Nodes: The NDB node of the storage layer, which is the storage engine in the cluster environment. Each cluster node holds a shard of the complete data, depending on the number of nodes and configuration, the general data nodes is organized into a group, each group has an identical physical data (redundancy). The NDB storage engine first uses the storage node according to the redundancy parameter configuration and then segments the data based on the number of nodes
3. Management Server: Responsible for the management of the entire Clsuter cluster nodes, it saves the entire cluster environment configuration, start off each node, to achieve the regular maintenance of each node, backup recovery, etc., it gets the status and error information of each node, and reflected to all nodes in the cluster
Cache
1. Application Layer Cache: The most common cache mode, the application layer manages the cache itself, regardless of the database
There are many options for the application layer cache
a). Memcached: Distributed cache, good performance, scalability, although slower than the local cache scheme, but with the sharing of cache effect
b). Berkeley DB: BDB implements a local cache compared to memcached, memcached is a memory-cached cache, and BDB uses disks at a different cost; memcached only supports hash indexes, and BDB supports hash index and B-tree index MySQL database schema using BDB, mainly for extensibility considerations, and BDB hash index can be more effective than MySQL
2. Mysql-memcached Integration: There are 2 different ways
a). Use memcached directly as a level two cache for MySQL。 This can increase the cache capacity of MySQL, and memcached is not visible to the application side. This scheme is suitable for a number of specific scenarios, such as data is difficult to segment, it is difficult to transform the application, etc.
This solution currently has a Waffle grid open source project that supports only the InnoDB storage engine. The principle is that when MySQL does not find data in the local Buffer pool, it reads from disk, and the Waffle grid joins a process here, first trying to read from the memcached server, if it does not exist in Memcached server, is read by disk IO, the data read is stored in memcached and recorded in the LRU list in InnoDB Buffer pool, and if the data becomes dirty or the cache management is cleared out, InnoDB moves the corresponding LRU list into the flush list , the waffle grid deletes data from mamcached server at this time. As can be seen from the above, the caching mechanism is still managed by MySQL, Memcached is only valid (not dirty data), only for the purpose of caching data, Memcached server crash downtime, etc. does not affect the MySQL data
Architecture (MySQL performance Tuning and architecture design):
Will the Waffle grid degrade query performance because of network access overhead? You can refer to the test results of the Waffle Grid DBT2 and evaluate the actual test
b). Use the MySQL UDF feature to integrate with the memcached。 Memcached data is maintained by MySQL and application-side programs, and the application reads data from memcached, reads from MySQL server and writes data to memcached, and data is updated, When deletion causes memcached data to fail, MySQL is responsible for clearing the corresponding data in the memcached
Architecture (MySQL performance Tuning and architecture design):
MySQL Memcached UDFs download and use the reference using the MySQL Memcached UDFs
data segmentation, sharding
Divided into vertical and horizontal, vertical cutting involves splitting a table into multiple tables in different fields, building different tables in different MySQL servers, and slicing the data of the same table into multiple tables for storage
1. Table structure Design Vertical Segmentation。 Some of the common scenarios include
a). Vertical segmentation of large print segments. Jianjian the large characters in separate tables to improve access performance of the underlying tables, in principle, avoid large segments of the database in performance-critical applications
b). Split vertically According to the intended use. For example, enterprise item attributes can be split vertically by basic attributes, sales attributes, purchase attributes, manufacturing attributes, financial accounting attributes, and more.
c). Split vertically According to the frequency of access. For example, in e-commerce, WEB 2.0 systems, if user attributes are set very much, you can cut the basic, frequently used, and infrequently used attributes vertically and separately
2. Sharding
Sharding refers to a "shared nothing" form of vertical cutting that deploys the cut-off part to a different server. The difference between sharding and partition can refer to DBA notes Feng Dahui's Open source database sharding technology (Share Nothing)
Picture from DBA Notes:http://www.dbanotes.com
Sharding scenarios can be considered:
a). session-based sharding, simply determine the processing node when the Session is created, and subsequent requests are directed directly to that node for processing
b). statement-based sharding, each statement needs to determine the processing node
c). transaction-based sharding, determines the processing node based on the first statement in the transaction
Sharding Potential problems:
a). Database-level Join operations cannot be used between separated sections (Cross-shard joins, consider deploying some common, global tables to each node, distributed using the replication mechanism)
b). Transaction processing is complex between separated sections
c). Management of self-growth keys (mainly in the case of sharding mixed with horizontal slicing)
3. Horizontal slicing
Example:
a). such as online e-commerce website, order form data volume is too large, according to annual, monthly level segmentation
b). Web 2.0 website registered users, too many online active users, in accordance with the user ID range, and so on, the relevant users and the user closely related tables to do horizontal segmentation
c). For example, the forum's top posts, because of the issue of paging, each page needs to display a sticky stick, this situation can be split up the level of the sticky stick, to avoid the top posts from the table of all posts read
One of the difficulties, the complexity of logic and association hinders horizontal segmentation. Such a scenario is difficult in how to determine the scope and strategy of the segmentation, such as the large ERP, such as SAP, the module, the table is very many, the logic between the complex, SAP per client (company, group) the entire business data is completely cut apart, if the granularity needs to be refined difficult is very large. The programme generally has: a). Split by primary key; b). Maintaining a Master Cut Index table, which is very extensible, but needs to find the primary index table
The 2nd is how to make horizontal segmentation have extensibility. Take the Web 2.0 website as an example, if you are slicing according to the Member ID range, if you decide to cut the level to 5 parts now, if you use the value of user_id% 5 to determine which part of the user belongs to, so in the future with the increase in user volume, if later need to cut into 20 parts will be quite troublesome
The mixed mode of horizontal slicing and sharding can theoretically achieve linear scaling, but is limited by application condition, design and segmentation, sharding implementation scheme
Segmentation and Consolidation Scenarios
The main 2 kinds of realization ideas:
1. Each application module configuration manages one or more of the data sources that it needs to directly access individual databases
2. Unified management of all data sources through the intermediary agent layer, the backend database cluster architecture is transparent to the front-end application
using MySQL Proxy:
This is provided by MySQL, which is located between the client program and MySQL server and is capable of monitoring, analyzing, and converting communication between them. Common scenarios include load balancing, fault recovery, query analysis, query filtering and modification, and basic ha mechanisms, and there are still some problems with the implementation of read and write separations, architecture (MySQL performance Tuning and architecture design):
MySQL Proxy provides a basic framework, other functions need to write LUA script implementation (sacrificing a certain performance, bring flexibility)
Hscale is a LUA module for MySQL proxy, and he transparently parses and rewrites queries, slicing and partitioning logic from the application layer to the agent layer
Hscale Project official website: http://www.hscale.org, author blog site: http://pero.blogs.aprilmayjune.org/
using Amoeba:
Amoeba is an open source framework for integrating a proxy program for distributed database data sources based on Java development, Amoeba Developer blog, Project home page
Address the following issues primarily:
a). Complex data source consolidation after data segmentation
b). Provide data segmentation rules and reduce the impact of data segmentation rules on the database
c). Reduce database and client connections
d). Read and write separate routes
Amoeba for MySQL is a program specifically for MySQL database, architecture:
Image from MySQL performance Tuning and architecture design
Amoeba for Aiadin is a more general scenario in which he receives requests from the MySQL protocol, and the backend can use other data sources such as MySQL, Oracle, PostgreSQL, which are transparent to the application. Architecture:
Image from MySQL performance Tuning and architecture design
using Hivedb
Hivedb is also a Java-based open source framework supported by commercial companies that currently only supports horizontal segmentation, while supporting data redundancy and basic ha mechanisms. He uses hibernate shards to achieve data-level segmentation, and to implement data redundancy mechanisms on its own. Hivedb data is distributed across multiple MySQL servers through a variety of user-defined partition keys, which parse query requests during access, automatically parse filter conditions, read data in parallel from multiple MySQL servers, and merge result sets back to the client application. Architecture:
Photo from Hivedb official website: http://www.hivedb.org/
Hibernate shards is contributed by Google, he uses the standard Hibernate programming model, Hibernate will be used with hibernate shards, extensible sharding Strategy, support for virtual shards, Handling when simplifying resharding
Spock Proxy
An open source project generated by the actual project (Spock is the application of rails, Speck proxy should be available for use outside of rails, such as PHP or. NET), based on MySQL proxy, a branch of MySQL proxy, Support for range-based Horizontal paritioning, his improvements to MySQL proxy include:
a). Improve performance without using Lua scripting. For example, it is also important to interact with the Lua script during the merge of the result sets returned by multiple data sources, which has a higher performance overhead
b). Client Login verification. MySQL Proxy supports the client to authenticate directly with each server, and Spock Proxy manages it, separating the client from the server.
c). Dynamic connection pooling. Benefit from improvements in the client login authentication mechanism
d). The current MySQL proxy is not able to read and write separation, Spock Proxy to achieve this, and asynchronous parallel execution
Architecture:
Image from Spock Proxy official website: http://spockproxy.sourceforge.net/
Pyshards
The Python-based sharding scheme is a personal research open source project, and his goal is to achieve automatic re-balancing (re-sharding), which is more challenging. MySQL is currently supported only. http://code.google.com/p/pyshards/
MySQL Design solution