MySQL data type and query efficiency

Source: Internet
Author: User
Tags hash numeric mysql tutorial pack unpack advantage

Do not use long when you can use short columns of data. If you have a fixed-length char data column, do not allow it to be longer than you really need. If you store the longest value in a data column with 40 characters, do not define char (255), but you should define it as char (40). If you can replace bigint with Mediumint, your datasheet will be smaller (less disk I/O), and the value will be processed faster in the process of calculation. If the data column is indexed, the performance improvement with a shorter value is more significant. Not only can indexes improve query speed, but short index values are also quicker to handle than long index values.
If you can choose the storage format for the data rows, you should use the one that best fits the storage engine. For MyISAM data tables, it is a good idea to use a fixed-length data column instead of a variable-length data column. For example, let all character columns replace the varchar type with the char type. Weighing the pros and cons, we'll find that the datasheet uses more disk space, but if you can provide additional space, the fixed-length data rows are processed faster than the variable-length data rows. This is especially true for tables that are frequently modified, because in those cases performance is more susceptible to disk fragmentation.
· When using variable-length rows of data, there are more fragments of the datasheet after the delete and update operations have been performed multiple times, due to the different length of the record. You must use optimize table to maintain its performance on a regular basis. Fixed-length data rows do not have this problem.
· If a data table crashes, a table with a fixed data row length is easier to reconstruct. When using fixed-length data rows, the start of each record can be detected, because these locations are multiples of the fixed record length, but not necessarily when using variable-length data rows. This is not a problem related to the performance of query processing, but it can certainly speed up the repair of the datasheet.
Although converting a MYISAM datasheet to a fixed-length data column can improve performance, you first need to consider some of the following questions:
· Fixed-length data columns are faster, but occupy a larger space. Each value of a CHAR (n) column, even if it is a null value, usually takes up n characters, because when it is stored in a datasheet, a space is added after the value. The VARCHAR (n) column occupies a smaller space because it only needs to allocate the necessary number of characters to store the value, plus one or two bytes to store the length of the value. Therefore, the choice between char and varchar columns is actually a comparison of time and space. If speed is the primary consideration, use the Char data column to get the performance advantage of fixed-length columns. If space is important, use the VARCHAR data column. In short, you can assume that fixed-length data rows can improve performance, although it takes up more space. But for some special applications, you might want to implement a data table in two ways, and then run tests to determine which situation meets the needs of your application.
· Even if you are willing to use a fixed length type, sometimes you have no way to use it. For example, a string longer than 255 characters cannot use a fixed-length type.
Memory data tables are currently stored with fixed-length data rows, so it does not matter whether you use char or varchar columns. Both are handled as char types.
For INNODB data tables, the internal row storage format does not distinguish between fixed-length and variable-length columns (all data rows use head pointers to data column values), so in essence, using fixed-length char columns is not necessarily simpler than using variable-length varchar columns. Thus, the main performance factor is the total amount of storage used by the data row. Since Char occupies an average of more space than varchar, it is better to use varchar to minimize the amount of storage and disk I/O for the data rows that need to be processed.
For BDB data tables, the difference is small, regardless of whether a fixed-length or variable-length data column is used. You can use both methods to try and run some experimental tests to see if there is a significant difference.
Defines a data column as not nullable (not NULL). This makes processing faster and requires less storage. It sometimes simplifies the query, because in some cases you don't need to check the Null property of the value.
Consider using the enum data column. If you own a data column with a low base (with a limited number of different values), consider converting it to an enum column. The enum values can be processed faster because they are internally represented as numeric values.
Use procedure analyse (). Run PROCEDURE analyse () to see the columns in the datasheet: SELECT * from Tbl_name PROCEDURE analyse (); SELECT * from Tbl_name PROCEDURE analyse (16,256);

Each column of output information gives an optimization recommendation for the data type of the columns in the datasheet. The second example tells procedure analyse () not to make recommendations for enum types that contain more than 16 or 256 bytes of value. Without such a restriction, the output information may be long, and enum definitions are often difficult to read.

Based on the procedure analyse () output information, you may find that you can modify your own datasheet to take advantage of those more efficient data types. If you decide to change the type of a data column, you need to use the ALTER TABLE statement.

Use the Optimize table to optimize the data tables that are affected by fragmentation. Data tables that are heavily modified, especially those that contain variable-length data columns, are susceptible to fragmentation. The fragmentation is bad because it causes the disk blocks used to store the data tables to form useless space (empty). Over time, in order to get a valid data row, you have to read more blocks and performance will be reduced. This will appear on any variable-length data row,

However, it is especially important for blobs or text data columns, because their lengths vary too much. Using the Optimize table under normal circumstances prevents the performance of the datasheet from being degraded. OPTIMIZE table can be used for MyISAM and BDB datasheets, but defragments can only be used for MYISAM data tables. The defragmentation method in any storage engine is to dump the (dump) datasheet with the MySQL tutorial dump, and then delete and re-establish those data tables using the Dump files:

% mysqldump--opt db_name tbl_name > Dump.sql% mysql db_name < dump.sql
Pack the data into a BLOB or text data column. Use BLOB or text data columns to store packaged (pack) data and unpack (unpack) in your application, allowing you to get any information you need in a single retrieval operation without having to retrieve it multiple times. It is also helpful for data values and frequently changing data values that are difficult to perform with standard data-sheet structures.

Another way to solve this problem is to have those applications that work with Web Forms package the data into a single blob or text data column. For example, you can use XML to represent questionnaire replies, and to store those XML strings in text data columns. Because you want to encode data (which you need to decode when retrieving data from a datasheet), it increases the overhead of the client, but it simplifies the data structure and eliminates the need to change the structure of the datasheet because it changes the contents of the questionnaire.

On the other hand, blobs and text values can also cause problems of their own, especially when a lot of delete or update operations are performed. Deleting this value leaves a large "hole" in the datasheet, and the "Empty" records may be different in length at a later time (the Optimize table discussed earlier suggests some suggestions to solve the problem).

Use a synthesized (synthetic) index. The indexed columns that are synthesized are useful at some point. One way to do this is to create a hash value based on the contents of another column and store that value in a separate data column. Next you can find the data row by retrieving the hash value. However, we should note that this technique can only be used for exact matching queries (hash values are not useful for range search operators such as < or >=). We can use the MD5 () function to generate hash values, or we can use SHA1 () or CRC32 (), or use our own application logic to compute hash values. Keep in mind that numeric hash values can be stored efficiently. Similarly, if the hash algorithm generates strings with trailing spaces, do not store them in char or varchar columns, which are affected by trailing space removal.

A synthesized hash index is particularly useful for those blob or text data columns. A hash identifier value is found faster than searching the BLOB column itself.

Avoid retrieving large blobs or text values when they are not necessary. For example, a SELECT * query is not a good idea unless you are able to determine that the WHERE clause that is the constraint will only find the data row that you want. Otherwise, you may be sending a large number of values over the network without any destination. This is also an example of how blob or text identifier information is stored in a composite indexed column to help us. You can search for indexed columns, determine which rows of data you want, and then retrieve BLOBs or text values from qualifying rows of data.

Separate the BLOB or text column into a separate table. In some environments, if you move these data columns to a second datasheet, you can convert the data columns in the original datasheet to a fixed-length data row format, and then it makes sense. This reduces fragmentation in the primary table, giving you the performance advantage of fixed-length data rows. It also allows you to run a select * query on the main datasheet without transmitting a large number of blobs or text values over the network.

Load Data efficiently

In most cases, your focus is on optimization of select queries, because select queries are the most common types of queries, and it's not too easy to optimize them. In contrast, the operation of loading data into a database tutorial is relatively straightforward. However, you can still use some strategies to improve the efficiency of data loading operations. The basic principles are as follows:

· Bulk loading is more efficient than single-line loading, because in each

When the record is loaded, the key cache (flush) is not refreshed, and the key cache can be refreshed at the end of this batch of records. The more frequently the key cache refreshes, the faster the data is loaded.

· A datasheet without an index is loaded faster than an index. If an index is present, not only will the record be added to the data file, but the index must also be modified to reflect the new record.

· Shorter SQL statements are faster than long SQL statements because they involve fewer server-side parsing processes and are faster to send them from the client to the server over the network.

Some of these factors appear to be minor (especially the last one), but if you load a lot of data, even a small difference in efficiency can lead to a certain performance difference. We can draw a few practical conclusions on how to quickly load data from the general principle of the preceding:

· Load data (all forms) is more efficient than insert because it is a bulk load of rows. The server only needs to parse and interpret a single statement, not multiple statements. Similarly, indexes need to be refreshed only after all rows of data have been processed, not once per row.

· Load data without local is faster than the local load data.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.