Database optimization examples and explanations

Source: Internet
Author: User

Database optimization examples and explanationsLin Tao posted: 2016-3-16 1:01 Category: xsql/program/etc Tags: mysql,mysql optimized 112 times

The database is an integral part of all architectures, and once the database has a performance problem, it can have disastrous consequences for the entire system. And the database once the problem, because the database is born with the state (sub-master) with data (generally not small), so the recovery time after the problem is generally not very controllable, so, the optimization of the database is that we need to spend a lot of energy to do.

Hardware layer Optimization

This layer is the simplest, in recent years, we believe that the term SSD is not unfamiliar, its ultra-high iops in the eyes just out of the moment to let people amazing, and with the recent price has been down, has been very cost-effective, the current micro-blog has been the SSD server as standard database services.

Let's take a look at our own performance test of SSD OLTP in the early years:

You can see that the QPS of OLTP can reach around 2.7w, the architecture with 1M2S can support 5w QPS, and in some simple scenarios, you may not even have to configure the cache layer to do the caching.

PS: Hardware testing is best to do their own measurement, the official data can only be used as a reference value, because many times the performance depends heavily on the scene, thinning to different SQL will get a big difference between the conclusion, it is best to test themselves.

In 2012, Weibo used Pcie-flash to support the feed system in the Spring Festival Gala 3.5w, which supported the development of the business in the early stage, and won a lot of time for the optimization and transformation of the architecture.

And you can see that a lot of cloud manufacturers are basically all the physical machines are SSD devices, AWS is the virtual machine to provide SSD disk to provide IO performance, you can foresee that the future IO will not be in the database is the biggest bottleneck point.

Experience: If the company is not bad money, it is best to invest directly in the SSD or pcie-flash equipment, and invest the sooner the better.

System-Level optimization

With SSD hardware, some of the original design of the system layer has problems, such as IO Scheduler, the system default is CFQ, mainly for the optimization of mechanical hard disk, because the mechanical hard disk needs to seek through the cantilever arm, so CFQ is very suitable.

Complete Fair Queuing

The algorithm assigns a time window to each process, allowing the process to make an IO request within that time window. The opportunity to make an IO request fair for all processes is ensured through the movement of the time window between different processes. At the same time, CFQ also realizes the priority control of the process, which ensures that the high priority process can get a longer window.

However, since the SSD disk has not been sought, but based on the erasure of electronic, so the CFQ algorithm has been clearly inappropriate, generally the Internet is recommended to use the NOOP algorithm, but I personally recommend deadline algorithm. Let's look at the features of these 2 algorithms.

The NOOP algorithm only has a waiting queue, whenever a new request is just a FIFO idea to insert the request to the end of the waiting queue, the default is that I/O does not have a performance problem, compared to save CPU resources.

The deadline scheduling algorithm offers a shorter wait time by reducing performance, it uses a polling scheduler, is compact and provides minimal read latency. Good throughput, especially for reading more environments.

From the characteristics of the algorithm, NoOp is indeed more suitable for SSD media, very simple, but because the database service has a lot of complex queries, the simple FIFO may cause some transactions difficult to get resources to wait for the state, so the individual is more recommended to use deadline.

PS: The main reason is that there is no obvious difference in the performance of the 2 algorithms for the pressure measurement display.

Here are the effects of our own business adjustments on line:

In addition to the above, there are some small places may have to adjust, although the benefits will not seem so obvious, but Mickle, added up, or is very worthy of optimization.

    • Using EXT4 or XFS
    • Add the Noatime property to Mount
    • RAID card read-write policy changed to write back
    • Replace the existing glibc with Jemalloc

Experience: Focus on optimization of IO, database especially MySQL is IO-intensive service, solve IO problem can reduce unnecessary problems.

MySQL's own optimization

Let's start by saying that there are parameters that can lead to performance changes.

    • innodb_max_dirty_pages_pct

Controversy is relatively large, generally in the 75-90, the main control of the BP dirty data brush disk timing, if too small will frequent brush disk cause IO rise, if too the General Assembly cause MySQL normally shut down when it takes a long time to normal shutdown, specific needs to see the actual scene, Personal recommendation 90.

    • Innodb_io_capacity

Disk IO throughput, specifically for the buffer landing time, can brush the number of dirty pages, the default 200, due to the use of SSD hard disk, so it is recommended to set to 3000-5000.

    • Innodb_read_io_threads
    • Innodb_write_io_threads

Increase the number of background processing threads, default is 4, recommend change to 8.

    • Sync_binlog
    • Innodb_flush_log_at_trx_commit

The famous double 1 parameter has a very large impact on performance.
Sync_binlog Control Brush Binlog strategy, mysql in every write n binary log binary logs, will use the Fdatasync () function to its write binary log binary logs synchronized to disk.

Innodb_flush_log_at_trx_commit control Log Buffer Brush log file policy, set to 0 per second refresh time, set to 1 when each commit will be refreshed.

From the above description can be seen if the pursuit of data security, then set up double one is the safest, if the pursuit of maximum performance, then the double 0 is the most suitable, the middle can be a difference of at least twice times performance.

    • Innodb_log_file_size

InnoDB redo log size, 5.5 maximum 4g,5.6 maximum 256G, the larger the greater the performance of the write, most of the time do not need to wait for checkpoint overwrite can always write.

    • Query_cache_type

Looks very beautiful things, but in the actual production environment, many times to bring us a fault, because each table update will empty buffer, and for SQL match is a character by the actual effect is very long, most of the time did not get the effect of the cache, but got a lot of wait for Query cache lock. recommended to close.

Above, only for MySQL 5.5, currently we are still groping 5.6 and 5.7 because there is no large-scale online use, so there is no experience.

Experience: If someone has the power to invest, you can learn bat two development for the database, with path to achieve higher performance and stability. If there is no manpower, as long as in-depth understanding of MySQL's own parameters of the impact can also meet the needs of the business, without blindly chasing the source level of development and transformation.

Business Optimisation

The so-called business optimization in fact, many times is the optimization of index, we dba often say a slow SQL can be all the above optimization burned, CPU directly full, RT all soared to 500ms or even more than 1s.

Optimization Slow check has the Sambo:

    • Pt-query-digest
    • Explain
    • Show profiling

First, the pt-query-digest can be used to locate which of the most-affected slow-track is located.

Then through the explain specific analysis of the problem of slow-check Xiao.

Focus on the three fields of Type,rows and extra.

Where the order of type is as follows:

System>const>eq_ref>ref>fulltext>ref_or_null>index_merge>unique_subquery>index_ Subquery>range>index>all

Finally, if the problem is still serious, you can use show profiling to locate the problem that occurred in that link.

You can see the sending data is the most time consuming, this time need to find out why the bottom of the sending on the consumption of so many times, is the result set is too large, or IO performance is not enough, and so on.

The following is an optimization result of a complex statement that can be clearly seen from rows to reduce the overhead of many queries.

Experience: It is better to set up a slow query monitoring system, every day to spend time on the optimization of slow-scan, to avoid a SQL-induced murders and other things happen.

Architecture optimization

Finally, that is the ultimate means, that is, architecture optimization, in fact, many times, when we will do the above several directions have not been found to have a good effect, it must find the development of students to chat about.

PS: Of course, to find PM students talk about life will be more effective.

Remember once, we look for development chat, finally development decided to change this function, this time you will suddenly find that no matter what optimization means than "do not" this optimization means, simply invincible.

According to my own experience, the optimization of the architecture layer has the following principles of universality:

    • The cache is king.

Hotspot data must use cache resistance, such as Redis or MC, to make MySQL anti-traffic unwise.

    • Use queue to eliminate peaks

It is well known that the asynchronous synchronization mechanism of MySQL is single-threaded, all the main library on the concurrency to the library from the Io-thread to do slowly, even if the main library write speed, from the library delay, the whole cluster is not available, so it is best to use the queue to write a certain peak, Keep writes at a more balanced level.

    • Moderate over-design

Many products at the beginning of the time is relatively small, but it is possible to go online after the popularity of the activity on the up, this time if the database bottlenecks need to be split needs to be developed, DBAs, architects and so on together to do, and there is probably no time. Therefore, in the early stage of the product design will be a certain over-the future of this situation lay a good cushion. The most obvious is the demolition of the table, it is best to start with a moderate vertical split of the business and relatively excessive horizontal splitting, in order to cope with the rapid growth of business.

Lift a chestnut:

    1. Reduce the write performance requirements for MySQL by MCQ.
    2. Through MC and Redis to assume the actual user access, 90% of the amount depends on the cache layer hosting and shielding.
    3. MySQL as the final data on the ground, storage of the full amount of data, but only support some business queries, less than 10%.

Experience: Let the right software do the right thing, don't just think about optimization from the technical level, but also from the needs of the decomposition.

Summary in the SUMMARY

Turn a very classic database optimization funnel law, many years ago saw, and now look still feel applicable, we encourage each other.

The only thing that doesn't apply is the bottom-up increase in resources, SSDs are really a good thing, who knows who to use.

Question

Q:mysql Cache How do you use the mix? Can you replace it with a redis-like NoSQL?

A: We turn off the QC by default, the cache useful MC is also useful for redis, it is generally to send hot data in the cache.

Q: What are some good tips or tricks for a multi-table federated query during the development process? such as student table, class table, Teacher's table, student elective curriculum, curriculum information table.

A: In fact, I would like to say that MySQL optimization of the join is not very good, if it is best not to use, if it is best not too many table joins, and it is best to use a small result set to drive.

How much does Q:SSD life?

A: Look at the official note, but we used the earliest for almost 4 years, some of the performance problems, but most of them are stable operation.

Q:mysql is the best performance when a single table supports how much data?

A: Experience value, do not exceed 3kw lines, do not exceed 30G.

Q: I would like to ask MySQL selection is considered unofficial, such as mariadb? What are the characteristics of MySQL cluster scheme and sub-database table comparison?

A: The official and unofficial advantages, we are using the Community version, the main convenience of communication, for MARIADB as long as there is a continuous follow-up is also very good, I have a lot of friends around me also in use. However, for a company with insufficient DBA, the community version is recommended, so the problem can be answered in a timely manner.

Q: Does this share have value-for-architecture, what is middleware used for, or is it self-developed? How is the delay of master-slave synchronization resolved?

A: Middleware We are self-developed, but frankly not all use, or to see the scene. The master-slave delay is said to be tears, currently using 5.7 of parallel replication in the solution.

Q: In the development process, often encountered a table in a very many fields, how to deal with this situation, whether to divide the table and other processing how to balance?

A: Many fields do not have to go to the table, mainly to see if there is a performance problem, but the general field will bring more trouble to build index, so it is better to do vertical split.

Q: What stage is appropriate for the partition of the library? What do you need to do with a business database that has a large business change?

A: Expected short-term impact on business development will do, if you do not want to have a greater impact on the business, it is best to invest a certain amount of server costs, first mirror a set of cluster transformation after the cut service.

Q: What policies or methods are used to ensure the consistency of the cache with the data in the database? In the case of using the Sub-Library sub-table, master-slave replication, etc.

A: Using the messaging system, or using software similar to heterogeneous replication middleware (such as our own databus), update MySQL and then update the cache.

Q: The first question, refer to the above mentioned MySQL asynchronous synchronization mechanism is a single-threaded problem, whether to consider the modification to multi-threaded synchronization to reduce latency, or upgrade to 5.6 version? The second problem, in addition to performance optimization, database consistency is also important here can be related to share, thank you.

A: Yes, the biggest problem with latency seems to be single-threaded replication, so we're looking forward to 5.7 of parallel replication, and 5.6 of parallel replication is library-based and doesn't improve much. As for consistency, look at the scene, if the pursuit is very strict, preferably on a double 1 and semi-synchronous replication, or switch to PXC.

Q: Experience: Focus on the optimization of IO, database especially MySQL is IO-intensive service, solve the problem of IO will reduce unnecessary problems. -Is there a good location IO or a tool for monitoring IO and good experience?

A: Monitoring is very important, have the ability to write according to their own needs, if you do not want to invest in the development of human direct use of open source is good, such as millet Open-falcon, mainly monitoring and meticulous, it is natural to find the problem.

Q: The current nosql is very fire, based on the document-based, column-based, objects, and so on, the previous period of time with the use of MongoDB, feel if the data structure design and control well, more efficient than MySQL, how do you think?

A: For the selection of the database I personally hold the best theory for the scene, that is, each database has a most suitable for their own scene, in this scenario is absolutely right to choose it. I am also looking at Mongodb,alicloud recently organized by the Hangzhou MongoDB users will be quite fire, but for MongoDB, the biggest advantage I think is the schema less and sharding, which for development and DBAs can save a lot of things, But the most important thing is that you still have to hold on, otherwise it is better to use MySQL.

For reprint Please specify: reproduced from 26 points of the blog

This article link address: database optimization Examples and detailed

Example of database optimization

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.