As a low-cost, high-performance, reliable and open source database products, MySQL is widely used in Internet enterprises. For example, Taobao has thousands of MySQL servers. Although the development of NoSQL in recent two years, new products are emerging, but in the business application of NoSQL for developers, and MySQL has mature middleware, operational tools, has formed a benign ecological circle. Therefore, in the current application is still mainly MySQL, NoSQL supplemented.
In the past year, we have done a lot of work in the direction of MySQL hosting platform, designed and implemented a UMP (Unified MySQL Platform) system, providing low-cost and high-performance MySQL Cloud database services. The developer requests MySQL instance resources from the platform to access the data through a single portal provided by the platform. UMP system internal maintenance and management resource pool, in a transparent form to provide master and slave hot standby, data backup, migration, disaster tolerance, read and write separation and a series of tables and other services. The platform reduces costs by running multiple MySQL instances on a single physical machine, and enables resource isolation, allocation and restriction of CPU, memory, and I/O resources, and support for dynamic expansion and contraction of the user's business, without affecting the provision of data services.
Evolution of the architecture
The first edition of the UMP system fixes a number of bugs based on MySQL proxy version 0.8, modifies the state machine process for managing user connections and database connections in proxy plug-ins, and compiles a LUA script to obtain user authentication and background database addresses in the central database to authenticate users. , the logic of connecting and forwarding packets to the backend database is established (as shown in Figure 1).
Figure 1 The first version of the UMP system (then known as the RDS system) uses MySQL Proxy
In the process of developing and deploying the first edition, we have come to realize several problems.
First of all, MySQL Proxy version 0.8 for multithreading support is relatively simple and rough, multiple worker threads share the same message queue, while listening to the same Socketpair channel. When a new event enters the message queue, Socketpair is written to a byte, and all dormant threads are awakened to compete for a mutex to fetch the task from the message queue. There are several problems with this implementation: one is to create a "surprise group" phenomenon, multiple threads are awakened but only one thread needs to complete the task; the second is that the CPU affinity of the task is poor, and events triggered on the same state machine switch back and forth across multiple processors. In addition, the global LUA lock is used in MySQL proxy and only one worker thread is allowed to execute the LUA script (planned to be improved in version 0.9). Therefore, in multi-threaded mode, the performance of MySQL Proxy is far from linear growth with the CPU core, even in the 16 core performance is not as good as 4 cores. In the case of single process mode, multiple processes need to be deployed on one physical machine to efficiently utilize the machine's processing power, but to create trouble with deployment, monitoring, and service upgrades.
Secondly, because the MySQL proxy framework is not easy to expand in function, it is difficult to realize the user's connection number limit, QPS limit and master-slave switching, reading and writing separation, and the function of separating the tables.
Finally, the MySQL proxy community has been inactive in recent years, and C language has a high requirement for developers, and it is difficult to require all team members to collaborate to develop code that combines elegance and correctness.
Therefore, we decided to rewrite the proxy server in Erlang language, replacing the original MySQL proxy module. At present, the entire project has 50,000 lines of Erlang source code, 30,000 lines of C + + source, 20,000 lines of other language source.
Why Choose Erlang Language
Erlang is a structured, dynamic, functional programming language. It is common to say that Erlang is a concurrency-oriented (concurrent-oriented), which means that Erlang defines the concepts and behavior of erlang processes in the language (the "Erlang process" referred to in this article refers to processes defined in the Erlang language. To differentiate them from familiar operating system processes). The Erlang process is also a concurrent execution unit compared to the operating system's processes/threads, but is particularly lightweight, a "green process" for managing and scheduling within Erlang virtual machines, the user state process (shown in Figure 2). For example, a newly created process takes up only 309 characters (8 bytes on the word,64 bit server) in a Erlang virtual machine that shuts down hipe and SMP support. 233 of these words are heap space (including stacks), and creating and ending a process takes about 1~3 microseconds, while a Erlang virtual machine can support hundreds of thousands of or more processes at the same time.
Figure 2 The lightweight process for Erlang