Andrew Morgan is the chief Product manager for Oracle MySQL. Recently, he wrote about how MySQL cluster supports the 200M QPS.
about MySQL Cluster
MySQL Cluster is a real-time extensible and acid-compliant transactional memory database. The database has up to 99.999% availability and low total cost of ownership of open source software. In terms of design, it employs a distributed, multi-master-node architecture that eliminates single points of failure, scales horizontally across commercial hardware, and provides services for read/write intensive workloads that access data through SQL and NoSQL interfaces using the automatic sharding (auto-sharding) feature.
initially, MySQL cluster was designed as an embedded telecom database for in-network applications that required carrier-grade availability and real-time performance. Since then, its capabilities have increased rapidly with the new feature set, and its applications have expanded to Web, mobile, and enterprise applications on-premises or on the cloud, including: massive OLTP, real-time analytics, ecommerce (inventory management, shopping cart, payment processing, order tracking), online gaming, financial transactions (fraud detection), Mobile and micro-payments, session management & caching, streaming, analytics and referrals, content management and delivery, communications and online awareness services, subscriber/user information management, and equity.
MySQL Cluster Architecture
within MySQL cluster, there are a total of three types of nodes that serve the application. Here is a MySQL cluster architecture diagram with 6 node groups and a total of 12 data nodes:
the Data node is the primary node of MySQL cluster. They provide the following functions: In-memory and disk-based data storage and management, automatic "sharding" of Tables (sharding), and per-user defined partitioning, data replication between data nodes, transactional and data retrieval, automatic failback, self-healing (automatic resynchronization after troubleshooting).
tables are automatically partitioned across data nodes, and each data node is a master node that can accept write operations. This makes it easy to allocate write-intensive workloads between nodes, and the process is transparent to the application.
MySQL Cluster uses a resource-free architecture (such as not using shared disks) to store and distribute data, and to generate at least one copy of the data synchronously, and if a data node fails, there is always another data node that stores the same information. Allows requests and transactions to continue without interruption. Any transaction that is briefly interrupted during a data node failure (sub-second level) can be rolled back and re-executed.
mysql Cluster allows the user to choose how to store the data: all in memory or partially on disk (not indexed data only). In-memory storage is especially useful for frequently changing data (active working sets). The data stored in memory is written periodically (local checkpoint) to the local disk and coordinated across all data nodes so that the MySQL cluster can recover from a complete system failure (such as a power outage). Disk-based storage can be used to store data with less stringent performance requirements, and its datasets are larger than available memory. As with most other database servers, MySQL cluster uses page caching to cache frequently used, disk-based data in the memory of data nodes for performance.
application node provides connections from application logic to data nodes. The application can use SQL to access the database and perform the functions of the SQL interface on the data stored in MySQL cluster through one or more MySQL servers. When accessing the MySQL server, you can use any one of the standard mysql connector This gives users a choice of many kinds of access technologies. The NDB API is one of the optional scenarios. This is a high-performance interface based on C + + that provides additional control, better real-time behavior, and higher throughput capabilities. The NDB API also provides a layer that allows the NoSQL interface to bypass the SQL layer to access MySQL Cluster directly, reducing latency and improving development flexibility. Existing interfaces include Java, JPA, Memcached, JavaScript and node. js, Http/rest (with Apache Module). All application nodes can access data from all data nodes, so they do not cause a service outage even if a failure occurs because the application uses only the remaining nodes.
The Management node is responsible for publishing cluster configuration information and node management to all nodes in MySQL cluster. The management node is used when booting, joining nodes to the cluster, and reconfiguring the system. Managing node shutdown and restart does not affect the operation of the data node and the application node. By default, the management node also provides the quorum service when encountering a network failure that causes cluster splitting (split-brain) or a network partition .
Scalability through transparent sharding
the rows of any table can be transparently divided into multiple partitions/fragments. For each fragment, there will be a separate data node that holds all of its data and handles all read and write operations against those data. Each data node also has a partner node, which collectively makes up a node group, and the partner node stores the second copy of that fragment and its own original fragment. MySQL Cluster uses a synchronous two-phase commit protocol to ensure that transaction commit changes are stored to two data nodes at the same time.
MySQL cluster uses the table's primary key as the Sharding key (Shard key) By default, and performs a MD5 hash of the Sharding key to select the fragments/partitions that the data should store. If a transaction or query requires access to data from multiple data nodes, one of the data nodes will assume the role of the transaction Coordinator and delegate the work to other required data nodes, and the results will be merged before they are available to the application. It is important to note that transactions or queries can connect data from multiple shards and multiple tables, which is a huge advantage over traditional NoSQL data stores that implement the Shard mechanism.
when a single node can meet the requirements for high-intensity query/Transaction data operations, the optimal (linear) scaling is achieved (as this reduces the network latency of message delivery between data nodes). To do this, the application should be well aware of the data distribution-which in effect means that the person defining the pattern can specify the column to use as the Sharding key. For example, the tables in the table use a primary key that is composed of user-id and service names, and if you use only User-id as the Shard key, all rows for a particular user in the table will always be stored in the same fragment. What is even more powerful is that if the same User-id column is used in other tables and is set as a shard key, the data for a particular user in all tables is stored in the same fragment, and the query/transaction for that user can be handled by a single data node.
Maximize data access speed with NoSQL APIs
MySQL Cluster provides many ways to access data, and the most common method is SQL, but as you can see, there are many native APIs that allow applications to read/write data directly from the database, avoiding the inefficiencies and development complexity of converting to SQL and passing it to MySQL servers. Currently, MySQL cluster provides APIs for C + +, Java, JPA, Javascript/node.js, HTTP, and memcached protocols.
benchmark: 200 million queries per second
by design, MySQL cluster is designed to handle the following two types of workloads:
- OLTP (online transaction processing): memory-Optimized tables can provide low latency at sub-millisecond levels and very high levels of OLTP workload concurrency, and can still provide good stability, and they can also be used for disk-based storage tables.
- Instant Search : MySQL cluster improves the number of concurrency that can be used when performing table scans, greatly improving the search speed of unindexed columns.
that being said, MySQL cluster is best designed to handle OLTP workloads, especially in cases where a large number of query/transaction requests are sent in a concurrent manner. To do this, they use the flexasynch benchmark to measure the increase in NoSQL access performance after more data nodes join the cluster.
In this benchmark, each data node runs on a dedicated 56-thread Intel e5-2697 v3 (Haswell) machine. Shows the change in throughput during data node increase from 2 to 32 (note: MySQL cluster currently supports up to 48 data nodes). As you can see, the throughput is growing linearly, and at 32 data nodes, 200 million NoSQL queries per second are achieved.
readers can log in MySQL Cluster benchmark test page to see the latest results and a more detailed description of the test.
The benchmark for 200 million queries per second is on MySQL Cluster 7.4 (the latest official version), and for more information on this version, see here .
Transfer from Http://www.infoq.com/cn/articles/oracle-expert-talk-how-mysql-cluster-support-200m-qps?utm_campaign=infoq_ Content&utm_source=infoq&utm_medium=feed&utm_term=global
Oracle experts talk about how MySQL cluster supports 200M QPS