In-depth understanding of open source database middleware Vitess: Core features and how to stack data storage

Source: Internet
Author: User
Tags connection pooling mysql connection pool mysql version zookeeper etcd value store

Overview Vitess is a database solution for MYSQL extensions. It is designed to be able to run as efficiently as possible on dedicated hardware and to target the cloud. It integrates many of the important features of MYSQL database with the extensibility of NoSQL databases. Vitess has successfully served all of YouTube's database traffic since 2011.
The vitesskubernetes on Kubernetes is Google's open-source Docker container Cluster Management system, and vitess is an option for the Kubernetes user's logical storage engine.
Kubernetes schedules the nodes in a compute cluster, manages the load on top of those nodes, and groups the containers that contain the same application for easy management and discovery. With Kubernetes, you can easily create and manage a vitess cluster, out of the box.
Comparison of vitess and other storage options on native hardware The next step is to vitess to compare two common solutions, one vanilla MySQL implementation and one NoSQL.
Vitess vs. Vanilla Mysqlvitess improved the Vanilla MYSQL implementation in the following ways:
Vanilla MySQL vitess
Each MySQL connection has a memory cost that ranges from 256KB to almost 3MB, depending on the MySQL version you are using. As your user base grows, you need to increase the memory to support the increased connections, but increasing the memory does not help to improve the query speed. In addition, there is a lot of CPU overhead when getting these connections. Vitess BSON-based protocols create very lightweight connections of only about 32KB. The Vitess connection pooling feature uses the Go language's excellent concurrency support to map these lightweight connections to a small MYSQL connection pool. As a result, the vitess can easily handle thousands of connections at a time.
Inefficient write queries, such as some write queries that do not have a limit set, will adversely affect the database performance of all users. The SQL parser used by vitess uses a set of configurable rules to override queries that may degrade database performance.
Sharding is the process of partitioning your data to improve scalability and performance. MYSQL does not support sharding, requiring you to write your own shard code and embed the Shard logic in your own application. Vitess uses a range-based shard. It supports both horizontal and vertical re-slicing, and it takes only a few seconds of read-only downtime to complete most data conversions. Vitess can even adapt to one of your existing custom sharding schemes.
MYSQL cluster is a master-slave replication with a primary database and several replica databases to ensure availability. The main library goes down, and one from the library becomes the new Main library. This requires you to manage the life cycle of the database and communicate the current system state to your own applications. Vitess helps you manage the life cycle of your database. It supports and automatically respond to a variety of scenarios, including primary library failover and data backup.
A MySQL cluster can customize the database configuration for different workloads, such as a write-only main library that satisfies the Web client's fast read-only copy library, a slow read replica library for batch jobs, and so on. If the database has been sliced horizontally, you need to repeat the installation process for each shard, and the application needs to have built-in logic for how to find the appropriate library. Vitess uses a topology support for data storage consistency, such as ETCD or ZooKeeper. This means that the cluster view is always up-to-date and consistently consistent across clients. Vitess also provides an efficient proxy for routing queries to the most appropriate MYSQL instance.

Vitess vs. NoSQL If you are thinking about a NoSQL solution primarily because of the scalability of MYSQL, then vitess is probably a more appropriate choice for your application. While NoSQL provides powerful unstructured data support, Vitess offers more advantages than NoSQL data storage:
NoSQL vitess
NoSQL databases do not define relationships between database tables, and only a subset of the SQL language is supported. Vitess is not a simple key-value store. It supports complex query statements such as WHERE clauses, join queries, aggregation functions, and so on.
A NoSQL database does not support transactions. The vitess supports transactions within a single shard. The Vitess team is also exploring the feasibility of using two-phase commits to support cross-shard transactions.
NoSQL Solutions have custom APIs that lead to custom architectures, applications, and tools. Vitess only added minimal changes to MYSQL, an overwhelming majority of people are already accustomed to using the database.
NoSQL provides a limited number of database index support compared to MYSQL. Vitess allows you to use all of MYSQL's indexing features to optimize query performance.

Feature performance
    • Connection pooling-extends the number of front-end connections while optimizing MYSQL performance.
    • Remove duplicate queries-when dynamic queries are still executing, the results of the same dynamic query are reused for any subsequent identical requests.
    • Transaction manager-limits the number of concurrent transactions and manages the execution deadlines for each transaction to optimize overall throughput.
    • Row caching-a row-based cache (using memcached) is maintained for queries that require a more efficient field that is randomly accessed based on the primary key, which is useful for OLTP workloads. (MYSQL buffer cache optimizations are also only for indexes and table-scoped scans.) )。 This remarkable feature can be replaced by the implementation of a custom cache layer at the application layer.
Defense
    • Query rewriting and purging-adding limits and preventing updates from uncertainties.
    • Query blacklist-custom rules to prevent potentially problematic queries from being submitted to your database.
    • Query killer-terminates queries that take a long time to return data.
    • Table ACLs-Defines an access control list (ACLs) for the table based on the connected user.
Monitoring
    • Performance analysis-provides tools to monitor, diagnose, and analyze the performance of your database.
    • Query serial number-use a list of incoming queries to serve the OLAP payload.
    • Update serial number-a server that records the number of changes in a row in a database, which can be used as a mechanism to propagate changes to other data stores.
Topology Management tools
    • Main Library Administration Tool (process redefined)
    • Web-based management interface
    • Designed for use in multiple data centers/regions
Sharding
    • Almost seamless re-sharding
    • Support for vertical and horizontal shards
    • Built-in scope-based, or application-defined shard support
The architecture Vitess platform consists of several server processes, command-line tools, and web-based tools with consistent metadata storage support.
Depending on the status of your own application, you can eventually achieve a complete vitess through a number of different program processes. For example, if you are building a service from scratch, your vitess journey should start with defining your own database topology. However, if you need to extend your existing database, you need to deploy a connection agent first.
Regardless of the size of your DB cluster, vitess tools and servers are designed to help you. For smaller implementations, some features of vttablet, such as connection pooling and row caching, can help you get the most out of your existing hardware. Vitess's automation tools provide additional benefits for large implementations.
Description of the relevant components of the vitess:

The topology topology service is a metadata store that contains information about a running server, a shard scheme, and a master-slave library structure. This topology has a consistent data storage support. You can do this by using Vtctl(Command line) and Vtctld(Web interface) to browse your topology.
The data storage in the Kubernetes is ETCD. Vitess's source code also contains Apache ZooKeeper support.
Vtgate vtgateis a lightweight proxy server that routes traffic to the correct vttablet and returns the merged results to the client. The application sends a query to the Vtgate server. So the client logic is simple because it only needs to be able to find a vtgate instance.
To route queries, Vtgate needs to consider the sharding scheme, the latency required, the tablet, and the availability of the MYSQL instance behind them.
Vttablet Vttabletis a proxy server located in front of a MYSQL database. In a vitess implementation, each MYSQL instance has a vttablet.
Vttablet plays a role in trying to maximize MySQL throughput and protect MySQL from harmful queries. Its features include connection pooling, query rewriting, and query replication. In addition, Vttablet performs VTCTL-initiated tasks and provides traffic services for master and slave library request filtering and data output.
A lightweight vitess implementation uses Vttablet as an intelligent connection agent for a single MYSQL database query service. By running Vttablet in front of your MySQL database and changing your application to use Vitess's client instead of MYSQL driver, your application can get a series of excellent features such as Vttablet connection pooling, query rewriting, and query replication.
Vtctl Vtctlis a command-line tool for managing vitess clusters. It allows a person or application to easily interact with a vitess implementation. With Vtctl, you can identify master-slave libraries, build tables, initiate failovers, Shard (and re-shard) operations, and so on.
When the vtctl operation, it updates the lock server as needed. Its vitess server observes these changes and responds accordingly. For example, if you use Vtctl to fail over a master library to a new main library, Vtgate will send subsequent writes to the new main library after seeing this change.
Vtctld Vtctldis an HTTP server that allows you to browse the information stored in the locked server. It can be used for troubleshooting, and can also be used to get a high-level overview of all servers and their current status.
Vtworker VtworkerMake some long-running processes. It supports a plug-in architecture and provides a third-party library so you can easily choose which tablet to use. The plugin can be used for the following types of work:
    • resharding differ: Work to verify data integrity when splitting horizontal shards and merging
    • Vertical split differ: work to verify data integrity when vertical segmentation and merging
Vtworker also allows you to easily add additional validators. You can perform In-tablet integrity checks to verify correlation or cross-slice integrity checks such as foreign keys, for example, the data pointed to by an index table in one key space is in another key space.
Other support Tools Vitess also include the following tools:
    • mysqlctl: Managing MYSQL instances
    • ZK: ZooKeeper command-line client and browser
    • zkctl: Managing ZooKeeper instances
Historical vitess has been a fundamental part of YouTube's infrastructure since 2011. The following is a brief summary of the events that led to the birth of Vitess:
    1. YouTube's MySQL database suffers from a bottleneck that quickly exceeds the service capabilities of the database when peak traffic arrives. To postpone the problem, YouTube created a primary database for write traffic and a slave database for read traffic.
    2. With the unprecedented upsurge in micro-video, read-only traffic is still higher than the maximum load from the database. So YouTube added more from the library, once again providing a temporary solution.
    3. Later, the write traffic increased so much that it exceeded the processing power of the main library, which required YouTube to cut the score to handle incoming write traffic. (If the overall size of the database becomes too large for a MySql instance, sharding can be a necessary thing to do.) )
    4. The YouTube application layer has been modified so that before any database operations are performed, the code can determine the correct database shards to accept for that particular query.
Vitess lets YouTube remove this logic from the source, introducing proxies that the application routes to the database and manages the database interactions. Since then, YouTube has expanded its user base by more than 50 times, greatly improving its service pages, processing new uploads and other aspects of the ability. More importantly, Vitess is a platform that is still expanding.
YouTube chooses to write vitess in the go language because the Go Set has a strong expressive power and performance. It is almost as capable as Python and very easy to maintain. But its performance is in the same range as Java, and is close to C + + in a particular scenario. In addition, the language is well suited for concurrent programming and provides a very high-quality standard library.
Open source first open source version of Vitess and its similar to the version that YouTube is using. While there are some differences in YouTube's ability to take advantage of Google's infrastructure, the core features are the same. When developing new features, the Vitess team will first allow them to run in open source tree. Then, in some scenarios, the team wrote a plugin that took advantage of Google's specific technology. This ensures that the open source version of Vitess maintains the same quality level as the build.
Vitess Most of the development work is open, it is hosted on GitHub. In this regard, vitess is built with scalability in mind, and the goal is that you can adapt it to your own architecture.
Original link: http://vitess.io/overview/.

In-depth understanding of open source database middleware Vitess: Core features and how to stack data storage

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.