MySQL Data Types and query efficiency
Do not use a long value when using short data columns. If you have a CHAR data column with a fixed length, do not let it exceed the actual length. If the maximum value stored in the data column is 40 characters, do not define it as CHAR (255), instead of CHAR (40 ). If you can replace BIGINT with MEDIUMINT, your data table will be smaller (less disk I/O), and the value processing speed will be faster during calculation. If the data column is indexed, the performance improvement caused by using a shorter value is more significant. Not only can indexes increase the query speed, but also the short index value is faster than the long index value.
If you can select the data Row Storage Format, you should use the one that is best suited to the storage engine. For MyISAM data tables, it is best to use a fixed-length data column instead of a variable-length data column. For example, let all character columns Replace the VARCHAR type with the CHAR type. Weigh the gains and losses, we will find that the data table uses more disk space, but if you can provide additional space, the processing speed of a fixed-length data row is faster than that of a variable-length data row. This is especially true for tables that are frequently modified, because in those cases, performance is more vulnerable to disk fragmentation.
· When a variable-length data row is used, because the record length is different, more fragments are required after multiple Delete and update operations. You must use optimize table to regularly maintain its performance. There is no such problem with fixed-length data rows.
· If a data table crashes, it is easier to reconstruct a table with a fixed row length. When a fixed-length data row is used, the start position of each record can be detected because these locations are multiples of the fixed record length, however, it is not necessary to use a variable-length data row. This is not a problem related to query processing performance, but it can definitely speed up data table repair.
Although converting a MyISAM data table to a data column with a fixed length can improve the performance, you must first consider the following issues:
· Data columns with a fixed length are faster, but occupy a large amount of space. Each value (even a null value) in the CHAR (n) column usually occupies n characters, because when it is stored in the data table, a space is added after the value. The VARCHAR (n) column occupies a small amount of space, because you only need to allocate the necessary number of characters for storing the value, plus one or two bytes to store the length of the value. Therefore, when selecting between CHAR and VARCHAR columns, it is actually a comparison of time and space. If speed is the main factor, you can use the CHAR data column to obtain the performance advantage of a fixed-length column. If the space is important, use the VARCHAR data column. All in all, you can think that a data row with a fixed length can improve the performance, although it occupies more space. However, for some special applications, you may want to implement a data table in two ways, and then run the test to determine which situation meets the needs of the application.
· Even if you are willing to use a fixed length type, sometimes you cannot use it. For example, strings longer than 255 characters cannot use a fixed length type.
Currently, MEMORY data tables use fixed-length data rows for storage. Therefore, it does not matter whether CHAR or VARCHAR columns are used. Both are processed as CHAR type.
For InnoDB data tables, the internal row storage format does not distinguish between fixed-length and variable-length columns (all data rows use a header pointer to the value of the data column). Therefore, in essence, using a fixed-length CHAR column is not necessarily easier than using a variable-length VARCHAR column. Therefore, the main performance factor is the total amount of storage used by data rows. Because CHAR occupies more space on average than VARCHAR, it is better to use VARCHAR to minimize the total storage capacity of data rows to be processed and disk I/O.
For BDB data tables, no matter the data columns with fixed or variable length are used, the difference is not big. You can try either of the two methods and run some experiment tests to check whether there are obvious differences.
Define a data column as not null ). This will make processing faster and require less storage. It also simplifies the query, because in some cases you do not need to check the NULL attribute of the value.
Consider using the ENUM data column. If the base of a data column you own is very low (the number of different values is limited), you can consider converting it into an ENUM column. ENUM values can be processed more quickly because they are expressed as numerical values internally.
Use procedure analyse (). Run procedure analyse () to view the columns in the data table: SELECT * FROM tbl_name procedure analyse (); SELECT * FROM tbl_name procedure analyse (16,256 );
Each output column provides optimization suggestions for the Data Type of the columns in the data table. The second example tells procedure analyse () not to recommend for ENUM types containing more than 16 or 256 bytes. Without such restrictions, the output information may be long; The ENUM definition is usually hard to read.
According to the procedure analyse () output information, you may find that you can modify your own data tables to use more efficient data types. If you decide to change the type of a data column, you need to use the alter table statement.
Use optimize table to OPTIMIZE data tables affected by fragments. Data tables that are greatly modified, especially those that contain variable-length data columns, are vulnerable to fragmentation. Fragment is terrible because it will lead to useless space (holes) formed by disk blocks used to store data tables ). Over time, in order to get valid data rows, you must read more blocks to reduce the performance. This will appear on any variable-length data row,
However, BLOB or TEXT columns are particularly prominent because their lengths are too different. Using optimize table normally prevents data tables from performance degradation. Optimize table can be used for MyISAM and BDB data tables, but defragments can only be used for MyISAM data tables. In any storage engine fragment method, the mysql tutorial dump is used to dump data tables, and then the dump files are used to delete and recreate those data tables:
% Mysqldump -- opt db_name tbl_name> dump. SQL % mysql db_name <dump. SQL
Package data into BLOB or TEXT data columns. Use BLOB or TEXT data columns to store packed data and unpack the data in the application, so that you can obtain any required information in a retrieval operation, you do not need to perform multiple searches. It is also helpful for data values that are difficult to express using standard data table structures and frequently changing data values.
Another way to solve this problem is to let applications that process Web forms package data into a certain data structure and insert it into a single BLOB or TEXT data column. For example, you can use XML to indicate questionnaire reply and store the XML strings in the TEXT data column. Data Encoding (decoding is required when retrieving data from a data table) increases client overhead, but simplifies the data structure, in addition, it eliminates the need to change the data table structure because of changes to the content of the questionnaire.
On the other hand, BLOB and TEXT values may cause some problems, especially when a large number of delete or update operations are performed. Deleting such a value leaves a large "hole" in the data table ", the records filled with these "holes" in the future may have different lengths (the optimize table discussed earlier provides some suggestions to solve this problem ).
Use a composite (synthetic) index. Composite Index columns are useful in some cases. One way is to create a hash value based on the content of other columns and store the value in a separate data column. Next, you can find the data row by searching the hash value. However, we should note that this technology can only be used for exact matching queries (hash values are useless for range search operators like <or> = ). You can use the MD5 () function to generate a hash value, or use SHA1 () or CRC32 (), or use your own application logic to calculate the hash value. Remember that numeric hash values can be stored efficiently. Similarly, if the hash algorithm generates strings with Trailing spaces, do not store them in the CHAR or VARCHAR columns. They will be affected by trailing spaces.
The merged hash index is particularly useful for BLOB or TEXT data columns. The search speed with the hash Identifier value is much faster than the search speed for the BLOB column itself.
Avoid retrieving large BLOB or TEXT values when unnecessary. For example, the SELECT * query is not a good idea, unless you can determine that the WHERE clause as the constraint will only find the required data rows. Otherwise, you may have no destination to transmit a large number of values on the network. This is an example of storing BLOB or TEXT identifiers in a merged index column. You can search for index columns, determine the required data rows, and then retrieve BLOB or TEXT values from qualified data rows.
Separates BLOB or TEXT columns from separate tables. In some environments, if you move these data columns to the second data table, you can convert the data columns in the original data table to a fixed-length data row format, which makes sense. This reduces the fragmentation in the master table and gives you the performance advantage of fixed-length data rows. It also enables you to run the SELECT * query on the master data table without transmitting a large number of BLOB or TEXT values over the network.
Efficiently load data
In most cases, you are concerned about the optimization of SELECT queries, because SELECT queries are the most common query types and it is not easy to optimize them. In contrast, the operations for loading data into the database tutorial are relatively direct. However, you can still use some policies to improve the efficiency of data loading operations. The basic principles are as follows:
· Batch loading is more efficient than loading a single row, because
After a record is loaded, the key cache does not need to be refreshed (flush); you can refresh the key cache at the end of this batch of records. The more frequently the key cache refreshes, the faster the data is loaded.
· Data Tables without indexes Load faster than those with indexes. If an index exists, you must not only add the record to the data file, but also modify the index to reflect the new record.
· Short SQL statements are faster than long SQL statements because they involve less server-side analysis processes and send them from the client to the server over the network.
Some of these factors seem to be secondary (especially the last one), but if you load a lot of data, even a small difference in efficiency will lead to a certain performance difference. We can draw several practical conclusions on how to quickly load data from the previous general principle:
· Load data (all forms) is more efficient than INSERT because it loads DATA rows in batches. The server only needs to analyze and interpret one statement, rather than multiple statements. Similarly, indexes are refreshed only after all data rows are processed, rather than once per row.
· Load data without LOCAL is faster than load data with LOCAL.