MySQL Big Data query performance optimization Tutorial

Source: Internet
Author: User
Tags time zones
MySQL performance optimization includes table optimization and column type selection, table optimization can be subdivided why? 1, fixed length and variable length separation, 2, characters commonly used segment and not characters commonly used segment to separate; 3. In 1-to-many fields, add redundant fields on the field that requires correlation statistics.

First, table optimization and column type selection

Table optimization:

1, fixed length and variable length separation

For example, id int, which occupies 4 bytes, char (4) is 4 characters long, and time is fixed for each cell value.

Core and characters commonly used paragraph, should be built fixed length, put in a table.

Varchar,text,blob This variable-length field, which is suitable for placing a single table, with a primary key associated with the core table.

2, characters commonly used segment and not characters commonly used segment to separate

Need to combine website specific business to analyze, analysis field query scene, query frequency Low field, single split out.

3. In 1-to-many fields, add redundant fields on the field that requires correlation statistics.

Look at the following effects:

In each section, there are n posts that show the section information and the number of posts under the section on the home page.

How this is done.

If the board table has only the first 2 columns, you need to remove the section

Look again at the Post table, select COUNT (*) from the Post group by board_id to get the number of posts per section.

II. Choice of column type

1. Field type priority

Integral type >date

Time>enum

Char>varchar>blob,text

Integer: Fixed length, no country/region, no character set differences. Like what:

tinyint 1,2,3,4,5 <--> char (1) a,b,c,d,e

The space is 1 bytes, but order by IS ordered, the former is fast. Reasons, or the need to consider character sets and proofing sets (that is, collation);

Time fixed, fast operation, save space. Considering time zones, writing SQL is not convenient where > ' 2018-08-08 ';

Enum, can play the purpose of restraint, internal use of integral type to store, but with Cahr, the internal to go through the string and value conversion;

Char fixed length, consider character set and (sort) proofing set;

varchar indefinite length, to consider the conversion of the character set and the collation set when the speed is slow;

Text/blob Unable to use memory temp table (sort operations only on disk)

Attached: The choice of Date/time, the master's clear opinion, directly select int unsgined NOT NULL, store the timestamp.

For example:

Gender: Taking UTF8 as an example

char (1), 3 word length byte

Enum (' Male ', ' female '), internal to digital to save, one more conversion process

tinyint (), fixed length 1 bytes

2, do enough, do not be generous (such as smallint varchar (N))

Cause: Large bytes waste memory and affect speed.

Take age as an example tinyint unsigned not NULL, can store 255 years old, enough. 3 bytes were wasted with int;

The contents of varchar (300) are stored in the same way, but varchar (300) takes more memory when the table is linked.

3, try to avoid null ()

Cause: null is not conducive to indexing, and is marked with special characters.

The space occupied on disk is actually larger (MySQL5.5 has made improvements to NULL, but the query is still inconvenient)

Third, index optimization strategy

1. Index type

1.1 B-tree Index

Called Btree Index, the big aspect, all uses the balance tree, but the concrete implementation, each engine slightly different, for example, strictly speaking, NDB engine, uses is T-tree.

But abstract b-tree system, can be understood as "orderly fast query structure."

1.2 Hash Index

In the memory table is the default hash index, hash theory query time complexity is O (1).

Question: Since hash lookup is so efficient, why not use a hash index?

Reply:

1, the hash function calculated results, is random, if the data is placed on the disk, the primary key for the ID as an example, then as the ID of the growth, the ID corresponding to the row, on the disk randomly placed.

2. Unable to optimize the range query.

3, can not take advantage of the prefix index, such as in Btree, the value of the field column "HelloWorld", and indexed query X=helloworld can naturally take advantage of the index, X=hello can also take advantage of the index (left prefix index).

4, sorting can not be optimized.

5, must return to the line, that is, through the index to the data location, must return to the table to fetch data.

2. Common misunderstanding of Btree index

2.1 Index The columns commonly used in the Where condition, for example:

where cat_id = 3 and price>100; query the third column, more than 100 yuan of merchandise.

Myth: Indexes are added on cat_id and price.

Error: Only use the cat_id or price index, because it is an independent index and can only be used with one.

2.2 After indexing on multiple columns (federated index), which column is queried, the index will be useful

Myth: On multi-column indexes, indexes play a role and need to meet the left prefix requirements.

Take index (A,B,C) as an example, (Note and order)

Four, index experiment

For example: SELECT * from T4 where c1=3 and c2 = 4 and c4>5 and c3=2;

Which indexes are used:

Explain select * from T4 where c1=3 and c2 = 4 and c4>5 and c3=2 \g

As follows:

Note: (Key_len:4)

Cluster index and non-clustered index

MyISAM and InnoDB engine, the similarities and differences of index files

Myisam: Two files by News.myd and new.myi, index files and data files are separate, called non-clustered indexes. Both the primary and secondary indexes point to the physical row (the location of the disk)

InnoDB: Indexes and data are clustered, so they are clustered indexes. The row data is stored directly on the primary index file of the InnoDB, and the secondary index points to a reference to the primary key index.

Note: for InnoDB:

1, the primary key index is to hold the index value, but also in the leaves to store the row of data.

2, if there is no primary key (primary key), it will be unique key key.

3, if there is no unique, then the system generates an internal ROWID key.

4. In the index structure of the primary key, like InnoDB, the primary key value is stored and the row data is stored, which is called the clustered index.

Clustered index

Advantage: There is less time to query entries based on primary key, no back row (data is under primary key node)

Disadvantage: Frequent page splits occur when irregular data insertions are encountered

Related articles:

Mysql Performance Optimization

Related videos:

MySQL Optimization video tutorial

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.