"High-performance MySQL" Reading note--schema and data type optimization

Source: Internet
Author: User
Tags aliases sorts

1. Slow Query

When a resource becomes inefficient, it should be understood why. Like the following possible causes:
1. Resources are overused and the margin is insufficient to work properly.
2. Resources are not properly configured
3. Resources are damaged or malfunctioning
Because of slow queries, too many query practices are too long to accumulate logically.
Is slow query the reason or the result? There is no way to know before you go into the investigation. Keep in mind that this query is working normally when it's normal. A query that requires filesort and creates a temporary table does not necessarily mean that there is a problem. Although eliminating filesort and temporal tables is generally a "best practice".

2.MySQL Data types

Smaller is usually better: in general, you should try to use the smallest data type that can store the data correctly. Smaller data types are typically faster because they consume less disk, memory, and CPU cache, and require less CPU cycles to process.
Simple: Operations of simple data types typically require less CPU cycles, such as Integer Fu Cao, because character sets and collation rules (collations) make character comparisons more complex than integer comparisons, using integer storage IP addresses.
Try to avoid null: Usually it is best to make a column not null unless you really need to store a null value. If the query contains nullable columns, it is more difficult for MySQL to optimize because nullable columns make indexing, index statistics, and value comparisons more complex.

Many MySQL data types can store the same type of data, except that the length and range of the storage are different, the accuracy is allowed, or the physical space required (disk and memory space) is different.

Alias

MySQL supports many aliases for compatibility, such as Integer, bool, and numeric, and they are just aliases, although they may be confusing, but do not affect performance. If you create a table with a data type alias, and then use Show create table to check, you will find that the MySQL report is a basic type, not an alias.

MySQL can specify widths for integer types, such as int (11), which makes no sense for most applications: It does not limit the legal scope of a value, but only specifies how many of the MySQL interactive tools (such as the MySQL command-line client) are used to display the number of characters. for storage and Computation, int (1) and int (20) are the same .

The character type varchar and char are two of the most important string types.
VARCHAR:

The varchar type is chosen to store variable-length strings, requiring 1 or 2 extra bytes to record the length of a string, which is more space-saving than a fixed-length type because it uses only the necessary space (for example, the shorter the string uses the less space). Because varchar rows are longer, it may make the rows longer than they were when you update, which leads to additional work.
The char type is fixed-length: MySQL always allocates enough space based on the length of the defined string.
Usage Scenarios for varchar:
The maximum length of a string column is much larger than the average length; the column is not updated so fragmentation is not a problem; a complex character set such as Utf-8 is used, each character is stored in a different number of bytes.

CHAR:
The char type is fixed-length, and MySQL always allocates enough space based on the length of the defined string. Char is suitable for storing very short strings, or all values are close to the same length. For example, char is ideal for storing password MD5 values because this is a fixed-length value. Char is also better than varchar for frequently changed data because the fixed-length char type is not prone to fragmentation.

Large font segment character type

Both blob and text are string data types designed to store very large data and are stored in binary and string mode, respectively. MySQL sorts blobs and text columns differently for other types: it sorts only the first max_sort_length bytes of each column rather than the entire string.
Try to avoid using text and blob types, and if it is unavoidable, there is a trick to actually use substring (column,length) to convert the column value to a string (also applicable in the ORDER BY clause) where the BLOB field is used. This makes it possible to use a temporary memory table. However, to ensure that the truncated substring is short enough to make the temporary table larger than max_heap_table_size or tmp_table_size, the MySQL will temporarily compare the memory to the MyISAM disk temporary table.

Time Type

The timestamp type holds the number of seconds since midnight January 1, 1970, and timestamp uses only 4 bytes of storage, so its range is much smaller than datatime: it can only be represented from 1970 to 2038. The From_unixtime function is provided to convert a Unix timestamp to a date and a Unix_timestamp function to convert the date to a timestamp.
Sometimes people will store the Unix timestamp as an integer value, which will not bring any benefit. The format of saving timestamps with integers is often inconvenient, so we do not recommend this.

Identifier (identifier)

Integers are usually the best choice for identifying columns because they are fast and can use Auto_increment, do not use enum and set types as identity columns, and try to avoid using string types as identity columns because they consume space and are usually slower than numeric types. Especially be careful when using strings as identity columns in the MyISAM table, because MyISAM uses compressed indexes on strings by default, which results in a much slower query.

Special type of data: IP Address field (IPV4)

They often use varchar (15) to store IP addresses, however, they are actually 32-bit unsigned integers, not strings. MySQL provides the Inet_aton() and Inet_ntoa() functions to convert an IP address between an integer and a four-segment representation.

3. Paradigm and inverse paradigm in the database

In a normalized database, each fact data appears and appears only once, whereas in the case of a de-normalized database, the information is redundant and may be stored in multiple places.

The advantages of normalization:

1) Normalized update operations are usually faster than inverse normalization.

2) When the data is well normalized, there is little or no duplication of data, so only less data needs to be modified.

3) A normalized table is usually smaller and better placed in memory, so it is faster to perform the operation.

4) Little extra data means less need for distinct or group by statements when retrieving list data.
The disadvantage of a paradigm-based design schema is that it usually needs to be correlated, and more associations may invalidate some index policies, for example, normalization might put columns in different tables, and those columns could belong to the same index in a table.


Inverse paradigm Schema Because all data is in a single table, it's good to avoid correlation. The disadvantage is the high cost of the update operation, the need to update multiple tables, as to whether this is a problem, the need to consider the frequency of updates and the length of the update, and the frequency of execution of the select query to compare .
The reason for redundancy of some data from another parent table to a child table is usually the need for sorting.
It is also useful to cache derived values. If you need to show how many messages each user has sent (as in many forums), you can perform an expensive subquery each time to calculate and display it, or you can build a num_messages column in the user table that updates the value whenever the user sends a new message.

4. Cache Tables and Summary tables

Caching tables and summary tables, calculating statistics in real time is a very expensive operation, because you either need to scan most of the data in the table, or the query statement can run only on certain indexes, and such a particular index generally has an impact on the update operation, so you generally do not want to create such an index.
When using cached tables and summary tables, you must decide whether to maintain data in real time or rebuild on a regular basis, which is better for your application, but regular rebuilds are not just a resource saver, but you can keep your tables from being fragmented and indexed in a fully sequential organization (which is more efficient).

5. Materialized view

Materialized views are actually pre-computed and stored on disk tables that can be refreshed and updated through a variety of policies. MySQL does not natively support materialized views. Use the Open Source Tool flexviews to implement your own materialized views. It consists of the following sections:

1) Change data capture function, can read the server's binary log and resolve the changes of the related rows.

2) A series of stored procedures that can help you create and manage the definition of a view

3) Some materialized view tools that can be applied to the database

6. Counter Table

If your app saves counters in a table, you may encounter concurrency problems when you update the counters. There is a tip: Save the counter in multiple rows, update the Count +1 to randomly select a row to update, calculate the value of the time, do a sum sum.

7. Speed The ALTER TABLE operation (table structure changes)

In general, most ALTER TABLE operations will cause the MySQL service to break (lock the table and rebuild the table). MySQL performs most of the modification of the table structure by creating an empty table with the new structure, identifying all the data from the old table to insert the new table, and then deleting the old table. This may take a long time to operate.
Not all ALTER TABLE operations cause the table to be rebuilt. In theory, MySQL can skip the steps to create a new table. The default value of the column actually exists in the. frm file of the table, so you can modify the file directly without altering the table itself. For example, use ALTER Comlum to change the default value of a column:
Alert Table TableName
Alert column col1 set default 5;
This statement modifies the. frm file directly and does not involve table data, so this operation is very fast.

"High-performance MySQL" Reading note--schema and data type optimization

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.