Abstract: This article introduces the optimization techniques for data tables. The main content is: select the table type and open as few tables as possible, the relationship between the locking table and the query speed and how to optimize the table to improve the query speed. Due to data entry and removal, fragments are easily re-generated in the table's data files. For databases with large tables, myisamchk maintenance programs should be frequently used to clear fragments, when optimizing a table, you must select a good time and try to optimize the table if no user is allowed to access the table.
Selecting the appropriate table type can also greatly optimize the retrieval speed and maximize the database performance.
Select a table Type
With MySQL, you can choose between the formats of 4 available tables from a speed point of view (version 23.5.
· Static MyISAM
This format is the simplest and safest format, and it is also the fastest in the disk format. The speed comes from the hard way data can be found on the disk. When there is an index and a static format, it is very simple, just multiply the length of the row by the number of rows. When scanning a table, it is easy to read constant records from each disk. Security comes from the fact that if your computer crashes when writing a static MyISAM file, myisamchk can easily point out where each row starts and ends, so it can usually reclaim all records, except for the partially written ones. Note: in MySQL, all indexes can always be rebuilt.
· Dynamic MyISAM
This format is a bit complicated, because each line must have a head to show how long it is. When a record changes for a long time, it can also end at more than one position. You can use OPTIMIZE table or myisamchk to organize a table. It may be a good idea to move dynamic columns into another table if you have static data that is accessed/changed as some VARCHAR or BLOB columns in the same table to avoid fragmentation.
· Compressing MyISAM
This is a read-only type, which is generated using the optional myisampack tool.
· Memory (HEAP)
This type of table is very useful for small/medium-sized search tables. Copying/creating a common query table (joined by a join) to a (maybe temporary) HEAP table may speed up the joining of multiple tables. Suppose we want to make the following join, it may take several times to use the same data.
SELECT tab1.a, tab3.a FROM tab1, tab2, tab3WHERE tab1.a = tab2.a and tab2.a = tab3.a and tab2.c != 0;
To speed up it, we can use tab2 and tab3 to create a temporary table, because we use the same column (tab1.a) for search. Here is the command for creating the table and selecting the result.
CREATE TEMPORARY TABLE test TYPE=HEAP SELECT tab2.a as a2, tab3.a as a3 FROM tab2, tab3 WHERE tab2.a = tab3.a and c = 0;SELECT tab1.a, test.a3 from tab1, test where tab1.a = test.a1;SELECT tab1.b, test.a3 from tab1, test where tab1.a = test.a1 and something;
Characteristics of static (fixed-length) tables
This is the default format. It is used when a table does not contain VARCHAR, BLOB, or TEXT columns.
Fill all CHAR, NUMERIC, and DECIMAL columns to the column width.
Very fast.
Easy to buffer.
It is easy to rebuild after a crash because the record is in a fixed position.
You do not need to be reorganized (using myisamchk) unless a large number of records are deleted and you want to return free disk space to the operating system.
It usually requires more disk space than dynamic tables.
Dynamic table features
This format is used if the table contains any VARCHAR, BLOB, or TEXT columns.
All string columns are dynamic (except those with less than 4 characters ).
Each record is preceded by a bitmap, indicating which column is NULL ('') for the string column, or which column is zero for the number column (which is different from the column containing the NULL value ). If the string column has a zero length after the white space is deleted, or the numeric column has a zero value, it is marked in the bid and not saved to the disk. A non-null string is stored as a Length Byte plus string content.
Generally, a table occupies more disk space than a fixed-length table.
Each record only uses the required space. If a record is larger, it is cut into multiple segments as needed, which leads to record fragmentation.
If you update a row with information that exceeds the length of the row, the row is segmented. In this case, you may have to always run myisamchk-r to improve performance. Use myisamchk-ei tbl_name for statistics.
It is not easy to reconstruct after a crash, because a record can be divided into multiple segments and a connection (fragment) can be lost.
The expected row length for dynamic size records is:
3+ (number of columns + 7) / 8+ (number of char columns)+ packed size of numeric columns+ length of strings+ (number of NULL columns + 7) / 8
The penalty for each connection is 6 bytes. A dynamic record is linked whenever the change increases the record. Each new Link contains at least 20 bytes, so the next increase may be in the same link. If not, there will be another link. You can use myisamchk-ed to check the number of links. All links can be deleted using myisamchk-r.
Features of a compressed table
A read-only table created using the myisampack utility. All customers with MySQL extension email support can retain a copy of myisampack for internal use.
The decompressed code exists in all MySQL distributions so that customers without myisampack can read tables compressed by myisampack.
Occupies a small disk space to minimize disk usage.
Each record is compressed separately (with a low access overhead ). The header of a record is a fixed length (1-3 bytes), depending on the maximum record in the table. Each column is compressed in different ways. Some compression types are:
There is usually a different table for each column.
Suffix blank compression.
Empty prefix compression.
Use a number with a value of 0 for 1-bit storage.
If the value of an integer column has a small scope, the column uses the smallest possible type for storage. For example, if all values are in the range of 0 to 255, a bigint column (8 bytes) can be stored as a TINYINT column (1 byte.
If a column has only one small set of possible values, the column type is changed to ENUM.
You can use the combination of the preceding compression methods for the column.
Records with a fixed or dynamic length can be processed, but BLOB or TEXT Columns cannot be processed.
You can use myisamchk to decompress the package.
MySQL supports different index types, but the general type is ISAM. This is a B-tree index and you can roughly calculate the size of the index file as (key_length + 4) * 0.67, the sum of all keys. (This is the worst case, when all keys are inserted in order .)
String indexes are empty compressed. If the first index is a string, it also compresses the prefix. If the string column contains many trailing spaces or a VARCHAR Column cannot use the full length, the blank compression will make the index file smaller. If many strings have the same prefix, prefix compression is helpful.
Features of MySQL memory tables
Heap tables exist only in memory, so if mysqld is turned off or crashed, they will be lost, but because they are very fast, they are useful anyway.
The HEAP table in MySQL uses a 100% dynamic hash without Overflow and has no problems related to deletion.
You can only access things by using equations using an index in the heap table (usually using the = Operator ).
The disadvantages of heap tables are:
· You need enough extra memory for all heap tables you want to use at the same time.
· You cannot search for a part of the index.
· You cannot search for the next entry in sequence (using this index as an order ).
· MySQL cannot calculate the approximate number of rows between two values. This is used by the optimizer to determine which index to use, but on the other hand, the disk does not even need to be searched.
Number of database tables
The disadvantage of creating a large number of database tables in the same database is that if you have many files in a directory, opening, closing, and creating operations will be slow. If you execute SELECT statements on many different tables, when the table cache is full, there will be a bit of overhead, because for each table that must be opened, the other one must be disabled. You can reduce this overhead by making the table buffer larger.
Why are there so many open tables?
When you run mysqladmin status, you will see something like this:
Uptime: 426 Running threads: 1 Questions: 11082 Reloads: 1 Open tables: 12
It may be confusing if you only have 6 tables.
MySQL is multi-threaded, so it can have many queries on the same table at the same time. To minimize the problem that two threads have different states on the same file, the table is opened independently by each concurrent process. This consumes some memory and an additional file descriptor for data files. The index file descriptor is shared among all threads.
Database Table-level locking
The previous content mainly focuses on making individual queries faster. MySQL also allows statements to be scheduled, which makes queries from several clients more cooperative, so that a single client will not be locked for too long. Changing the scheduling feature also ensures that specific queries are processed faster. Let's first take a look at the default scheduling policy of MySQL, and then let's see what options can be used to change this policy. For the purpose of discussion, assume that the client program that executes the search (SELECT) is the read program. Another client program that executes the DELETE, INSERT, REPLACE or UPDATE operation on the table is the Write Program.
The basic scheduling policies of MySQL are summarized as follows:
· Write requests should be processed in the order they arrive.
· Writing has a higher priority than reading.
1. A major problem is as follows:
· A customer issues a SELECT statement that takes a long time to run.
· Then another customer sends an UPDATE to the table in use. This customer will wait until the SELECT statement is complete.
· Another customer issues another SELECT statement on the same table. Because UPDATE has a higher priority than SELECT, the SELECT statement will wait for UPDATE to complete. It will also wait for the first SELECT to complete!
Possible solutions to this problem are:
· Try to make the SELECT statement run faster. You may have to create some summary tables to do this.
· Start mysqld with -- low-priority-updates. This gives all statements that update (modify) a table with a lower priority than the SELECT statement. In this case, the last SELECT statement in the previous case will be executed before the INSERT statement.
· You can use the LOW_PRIORITY attribute to give a specific INSERT, UPDATE, or DELETE statement with a lower priority.
· Specify a low value for max_write_lock_count to start mysqld so that the READ lock is given after a certain number of WRITE locks.
· By using the SQL command: SET SQL _LOW_PRIORITY_UPDATES = 1, you can specify all the changes from a specific thread to be completed with a low priority. See the set option syntax.
· It is important to specify a specific SELECT using the HIGH_PRIORITY attribute. See SELECT syntax.
· If you have questions about INSERT and SELECT, switch to the new MyISAM table because they support concurrent SELECT and INSERT.
· If you mix INSERT and SELECT statements, the INSERT attribute of DELAYED may solve your problem. INSERT syntax.
· If you have questions about SELECT and DELETE, the DELETE option of the LIMIT option can help you. See the DELETE syntax.
2. The role of insert delayed in the client
If other clients may execute lengthy SELECT statements and you do not want to wait until the insertion is complete, insert delayed is useful. The client that publishes insert delayed can continue execution more quickly, because the server simply inserts the row to be inserted.
However, we should be aware of the differences between normal INSERT and insert delayed performance. If insert delayed has a syntax error, an error is sent to the client. If it is normal, no information is sent. For example, the obtained AUTO_INCREMENT value cannot be believed when this statement is returned. The number of duplicates on the unique index is not counted. This is because the insert operation returns a status before the actual insert is complete. The other statement also indicates that if the row of the insert delayed statement is waiting in queue and the server crashes or is terminated (with kill-9), the row will be lost. This is not the case when a normal TERM is terminated. The server inserts these rows before exiting.
Implement the scheduling policy with the help of table locks. The client must first obtain the table lock whenever it needs to access the table. You can directly use lock tables to do this, but the LOCK manager of the server will automatically obtain the LOCK as needed. When the client ends processing the table, the table lock can be released. The directly obtained lock can be released using the unlock tables, but the server also automatically releases the lock.
The client that executes the write operation must have an exclusive lock on the table. During the write operation, the table is in an inconsistent state because the data record is being deleted, added, or changed, and the indexes on the table may need to be updated accordingly. If the table is constantly changing, allowing other clients to access the table may cause problems. It is obviously not good for two clients to write the same table at the same time, because this will soon make the table unavailable. It is not a good thing to allow the client to read the constantly changing table, because it may be being modified at the moment of reading the table, and the result is incorrect.
The client that executes the read operation must have a lock to prevent other clients from writing the table to ensure that the table does not change during the read process. However, this lock does not require exclusive access to read operations. This lock also allows other clients to read tables at the same time. Reading does not change the table. It is unnecessary to prevent other clients from reading the table.
MySQL allows several query modifiers to influence its scheduling policy. One of them is the LOW_PRIORITY keyword of the DELETE, INSERT, load data, REPLACE, and UPDATE statements. The other is the SELECT h_priority keyword of the SELECT statement. The third is the DELAYED keyword of the INSERT and REPLACE statements.
The LOW_PRIORITY keyword affects scheduling as follows. Generally, if the write operation of a table arrives when the table is being read, The Write Program is blocked until the read program is completed, because once a query starts, it cannot be interrupted. If another read request arrives when the Write Program waits, the read program is blocked because the default scheduling policy has a higher priority than the read program. When the first read program ends, the Write Program continues. When the write program ends, the second read program starts.
If the write request is LOW_PRIORITY, the write operation is not considered as having a higher priority than the read operation. In this case, if the second read request arrives when the Write Program waits, the second read operation is scheduled before the write operation. Write operations are allowed only when there are no other read requests. Theoretically, this type of scheduling change indicates that write with LOW_PRIORITY may be blocked forever. When the previous read request is being processed, the new request can be placed before LOW_PRIORITY write as long as the other read request arrives.
The HIGH_PRIORITY keyword of the SELECT query works similarly. It causes the SELECT statement to be inserted before a waiting write operation, even if the write operation has a normal priority.
The DELAYED modifier of INSERT acts as follows. When an insert delayed request arrives in the table, the server puts the corresponding row into a queue and immediately returns a status to the client program, so that the client program can continue to execute, even if these rows have not been inserted into the table. If the reader is reading the table, the row in the queue is suspended. When no data is read, the server starts to insert rows in the delayed row queue. The server stops from time to see if new read requests have arrived and waits. In this case, the delayed row queue is suspended and the read program is allowed to continue. When there are no other read operations, the server inserts the delayed row again. This process continues until the delayed row queue is empty.
This scheduling modifier does not appear in all MySQL versions. The following table lists the modifiers and the MySQL versions that support these modifiers. You can use this table to determine the functions of the MySQL version:
Version of the statement type
DELETE LOW_PRIOrITY3.22.5INSERT LOW+PRIOrITY3.22.5INSERT DELAYED3.22.15LOAD DATA LOW_PRIORITY3.23.0LOCK TABLES ... LOW_PRIORITY3.22.8REPLACE LOW_PRIORITY3.22.5REPLACE DELAYED3.22.15SELECT ... HIGH_PRIORITY3.22.9UPDATE LOW_PRIORITY3.22.5SET SQL_LOW_PRIORITY_UPDATES3.22.5
Table Optimization
For long-term use of a table, due to the deletion and insertion of records, fragments are generated in the table's data files. In the following situations, fragments are generated in particular:
· Use the VARCHAR type
The result of frequent use of variable-length columns is that the table is very prone to fragmentation. If space permits, use a fixed-length CHAR.
· Use BLOB and TEXT
This is especially true when performing a large number of DELETE or UPDATE operations. Deleting BLOB will leave a large blank in the table. In the future, you will need to fill in one record or multiple records of different sizes.
· Delete most of a table, which also leaves a large number of blank spaces in the table.
To eliminate the impact of table fragments on performance, you need to optimize the table. In chapter 7, we introduce how to optimize a table:
1. Use SQL statement OPTIMIZE
Optimize table tbl_name
2. Use the hotfix myisamchk or isamchk.