The famous open source distributed cache service Codis's author, Pingcap co-founder & CTO, Senior Infrastructure engineer Huangdongxu, specializes in the design and implementation of distributed storage systems, the technology of open source fanatics, the great deity level figures. Even today, when the Internet is so prosperous, in the vague and uncertain boundaries of the database, he is still trying to find a definite direction of practice.
In the parallel world of the database, Huangdongxu is following his heart in different ways. In his opinion, the traditional relational database can not meet the massive data processing and analysis, the new round of window period also with the need to open, but the various disadvantages of architecture, memory architecture, NoSQL and other solutions can not meet their ideal solution, these are not beautiful, rarely can the distributed transaction and elastic expansion to achieve perfection.
Absolute rationality and Sensibility, seemingly contradictory in Huangdongxu's body, until the end of 2012, he saw two papers published by Google, like a prism, reflecting his own inner brilliance. These two papers describe a massive relational database F1/spanner used internally by Google, addressing relational databases, elastic scaling, and global distribution issues, and using them on a large scale in production. "If this can be achieved, it will be disruptive to the field of data storage", Huangdongxu is excited about the emergence of the perfect solution, and Pingcap's Tidb was born on that basis.
Of course, every step forward requires a great deal of effort. Before starting the TIDB project, Huangdongxu completed an open source distributed Redis cluster scheme Codis, which made them feel that although there was a solution to the problem of horizontal scaling of caches, the underlying relational database (mostly MySQL-dominated) did not have an elegant extension scheme. In addition to the business Layer sub-table, or use of middleware and other tradeoffs, there is no other way, some businesses may be able to migrate to NoSQL, such as HBase, c*, and many businesses can not smooth migration, almost need to rewrite all the logic. In the case of sub-library tables and middleware, scaling and highly available scenarios can lead to significant additional operational costs, such as the inability to use cross-Shard joins, subqueries, cross-line transactions, and so on.
But as a basic software engineer Huangdongxu They do not want to pass these complexities on to the business layer, so they begin to revisit the entire database, hoping to fundamentally solve the MySQL extension problem rather than rebuilding a middleware.
"It feels good to create something entirely new and make it productive someday," he said. ”
During 2012 and 2013, Huangdongxu began to study a series of papers published by Google on the new generation of distributed databases spanner and F1, as well as the relevant academic developments, until 2015, when they felt that basic technical problems and architectures had been thought about almost, so Decided to come out full-time to restart the complete implementation of a new database, which is today's protagonist-the next generation of open source Newsql database Tidb.
Of course, creation does not mean to start, it needs to face unlimited investment and unlimited game to adapt to the competition and scrutiny of the Internet, truly to enable developers and enterprises to benefit, is the real start.
TIDB in the overall architecture is basically a reference to the Google spanner and F1 design, on the two tiers of Tidb and tikv. Tidb corresponds to Google F1, is a layer of stateless SQL, compatible with most MySQL syntax, exposing MySQL network protocol, responsible for parsing the user's SQL statements, generating distributed Query Plan, translated into the underlying Key Value Sent to Tikv, TIKV is the real place to store data, with Google spanner, a distributed Key Value database that supports elastic level scaling, automated Disaster recovery and failover (high availability), and ACID cross-line transactions. It is worth mentioning that TIKV does not rely on the underlying distributed file system, like HBase or BigTable, to be better in terms of performance and flexibility, which is important for online business.
▲TIDB Overall architecture
This group of ideals is abundant, which is not confused by the reality of the bone. In the TIDB development language choice process, abandoned the Java and uses Go.
Tidb the entire project is divided into two tiers, tidb as the SQL layer, using Go language development, tikv as the bottom of the distributed storage engine, using Rust language development. The architecture is indeed similar to Foundationdb, and is based on a two-tier structure. FOUNDATIONDB's SQL layer is Java, with C + + at the bottom, but last year it was acquired by Apple.
In choosing a programming language that does not incorporate too much personal preference, the SQL layer chooses Go relative to Java:
The first is that their team background use Go to develop more efficient, and performance is good, especially for high concurrency programs, can use tools such as Goroutine/channel with less code to write the correct program;
The second is that many packages in the standard library are very friendly to the development of network programs, which is very important for a distributed system.
Third, the storage engine at the bottom of the performance requirements are very high, Go is a GC and Runtime language, the TIKV layer can choose not much, in the past basically only C or C + +, but in the last two years with the Rust language mature, and after a long time of thinking and a lot of experimentation, eventually their regiment The team chose Rust.
Rust's static language positioning is to replace C + +, the biggest feature is that through a lot of grammar restrictions to avoid developers to write out memory leaks and data race program, many problems in the compilation period, so that the runtime does not need to spend additional costs of GC and other things, to ensure high performance. So, write a secure program, which is a big pain point for C + + programs.
Although there are a lot of improvements in C + + 11, the historical burden is too heavy or the level of third-party package library developers is uneven. But the important reason is that they are not behind a very deep C + + background, so they finally gave up C + + 11 and chose Rust.
Rust not only has the characteristics of security and high performance, but also has a more modern syntax, more efficient development, and a very good package management mechanism (Cargo), making it possible to write very high performance and secure programs at the same time, the development efficiency than go did not fall too much, for the present is a very correct choice. As one of the world's largest open source projects in the rust community, and with the support of the official Rust language team, Huangdongxu said that including some of the third-party libraries they needed, the Rust team would either develop at a high priority or move forward in the community. In addition, Rust has already released 1.0, the syntax has long been stable, is a very promising system programming language.
Take turns in Google brush out the sense of existence, but also has been at no end of the Prairie run, Huangdongxu think only focus, focus, can get rid of confusing interference. After constant exploration, we finally found the way to implement the transaction model.
TIDB's transactional model was referenced by Google's percolator. The paper, published in 2010, describes the construction of the ACID cross-line transaction framework (BigTable) on Google to ensure consistency of index updates. The core idea of the algorithm is two-phase commit, but the traditional distributed two-phase commit problem is that the single point transaction manager can not be extended, it becomes the bottleneck of the whole system, and Percolator uses a two-level lock mechanism to realize the decentralized transaction manager, which greatly improves the expansibility of the whole system.
▲goolge Percolator Internal Implementation
TIDB this model in the underlying storage engine, and do a lot of engineering optimization, Huangdongxu for example, through batch and pipeline and other means to greatly enhance the timing service throughput, using Raft + ROCKDB to replace the original BigTable performance better, in addition to adopt optimistic The transaction mechanism pursues higher throughput, but from the algorithm level, is percolator implementation.
Tidb vs. NOSQL
Tidb for these NoSQL, the most important feature is that the programming interface is sql,sql for developers is more flexible way to operate the database, and MySQL has very high compatibility-the original business of MySQL switch to tidb almost a line of code can be done without modification. Tidb in support of SQL at the same time there is no loss of HBase such a system's elastic scalability, the business layer does not need to care about the capacity of the database, not to consider the Sub-Library sub-table, also do not like the past to invest in a large operational force, expansion simply add the machine, storage node failure for business transparency, And the database itself has the ability to repair itself, ensuring that data is not lost.
The same is true for MongoDB, and more importantly, there is no need to change the user's existing habits and procedures, and in order to define the future of the cloud database morphology, tidb design goal is a single cluster needs to scale to more than 1000 physical node size, support P-level capacity, Trillions of rows of structured data storage, under the constraints of the design and technology selection and MongoDB very different, in the case of large data volume TIDB performance more stable, extended smoother.
TIDB's SQL optimizer is a query optimizer for distributed storage design that Huangdongxu them from scratch, using many of the ideas of the new query optimization technology and distributed computing framework in academia, to ensure that MySQL compatibility is much better than MySQL under complex queries.
Pain Point resolution of traditional database
Any enterprise, if the use of traditional single-machine relational database, the volume of data continues to grow, or the availability of business is strict requirements, may face single point of failure and single-point capacity constraints, the problem in recent years in the Internet industry, especially prominent, Currently, apart from the above mentioned sub-database table and middleware is also no other solution, almost miserable.
TIDB based on the more advanced Raft algorithm to achieve the level of storage layer expansion based on a distributed transaction, the construction of a complete SQL query layer, in order to ensure that no loss of ACID transactions under the premise of support joins, sub-query and other complex queries, external exposed MySQL interface, so that users almost without intrusion , the storage problem of large amount of structured data is solved. Considering the traditional industry and the Internet industry, the difference is about 3 years, and this time is constantly shortening, recently with TIDB stabilized, more and more Internet use TIDB, I believe the future will become a new mainstream choice for the expansion of the database.
TIDB's application Scenario
The scenario is a typical OLTP scenario with a wide range of coverage to any enterprise. It is a typical user of TIDB to encounter extensibility issues on relational databases, require strong consistent transactions at the same time, and need to achieve strong consistency and high availability across multiple datacenters. TIDB support for MySQL is very good, based on the current use of MySQL users or enterprises, hoping to find a more elegant level of expansion solutions, is a great choice.
In fact, most of the users in the online production environment are basically internet scenes, from MySQL. TIDB currently does not support stored procedures and views , so the prerequisite is that there is no such operation in the existing business.
On the first day of the project to determine the TIDB maximum compatible MySQL, Huangdongxu Frankly, MySQL is a single-machine database, and the query optimizer is for the single-machine scene design, based on this architecture to do a distributed database is very difficult.
At this point, they decided to choose a more thorough path, rewriting the entire SQL Parser and query optimization engine. Although it seems almost impossible to do, but actually do they feel that in a better design and complexity of control, but it is a more easy way. The benefits of choosing full MySQL compatibility are not limited to user friendliness, but more importantly, it is possible to draw a lot of tests from the MySQL community. This is not difficult for a database product to make, how to prove that you are right, this is more important! Huangdongxu They constantly collect tens test cases from the MySQL community to ensure that each module is correct and consistent with MySQL behavior.
Extent of open source for TIDB projects
TIDB Project is 100% Open source, committed to doing a top-level open source project with international standards, from the Github repo itself is actually hard to see that it is a Chinese-led open source project, all the Commit records, all collaboration, roadmap, Issue tracking, Both Chinese and English documents, as well as code reviews, are open source.
And the project has been iterated to Beta 4 version, from online user feedback, the main function has been basically perfect and stable. The next important work, Huangdongxu says, will be continuous performance optimization and continued improvement in stability, as well as continuous testing in larger, more demanding, and rugged cluster environments. Of course, the surrounding tools, deployment tutorials, more design documents are also in the continuous enrichment.
The future of TIDB
From a more long-term perspective, everything will run in the cloud, and the database will be no exception. In the premise of massive data, large-scale cluster, relational database design and theory there are many things to explore, this cluster scale, all dependent on manual operation will be invalidated, because people are unable to scale, the database needs to have self-repair and self-expansion of the ability, and only this, In order to better utilize the computing resources of the cluster, which is why the TIDB team's positioning is to do cloud-native database, they are doing a lot of basic research and preparation for the future, including the Kubernetes and distributed database integration has also done a lot of exploratory work.
Huangdongxu want to TIDB define the next generation of relational database, the future developers can really focus on their business, do not care about the size of the database, concurrency may be how high, when the need to expand, choose which sharding key good and so on these issues should be hidden in a very simple SQL Under the interface.
Tidb has a very good start, they do, in the next-generation relational database, everyone can feel the productivity of this technology is beautiful!
Open Source project Address: HTTPS://GITHUB.COM/PINGCAP/TIDB
PS: Huangdongxu will be attending the WOT2016 Big Data Technology Summit on November 26 to share the contents of Newsql in Action:patterns and Tools at the NoSQL hands-on technology session.
WOT2016 Big Data Technology Summit website: http://wot.51cto.com/
World-Class Open source project: Tidb How to redefine the next-generation relational database