MySQL Best Practices

Source: Internet
Author: User
Tags table definition

MySQL best practices: 1. table Structure best practices 2. QUERY best practices table structure best practices: 1. the smaller the value, the better. TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, DECIMAL, and DOUBLE data types that occupy a small number of bytes should be selected to meet the requirements. As shown above, when TINYINT (1 byte) is used, SMALLINT (2 byte)/MEDIUMINT (3 byte) is never used. When there is no negative storage, try to use the UNSIGNED type. If feasible, use INT instead of FLOAT or DECIMAL. For example, when the storage price is set, the 100 * price is converted to an integer. The main purpose of this operation is to save storage space. The smaller the space occupied by data, the less time needed to query, thus saving time. 2. the DATE/TIMESTAMP/DATETIME (3 byte/4 byte/8 byte) principle is the same as above. Note: TIMESTAMP is gone from 1970 to 2038. DATETIME is from 1000 to 9999. Therefore, is the storage space saved? The value is not worth saving. It depends entirely on the business needs. 3. Avoid NULL extra BYTE consumption as much as possible. Increase the query complexity (vs. NULL). NULL is allowed only when NULL and NULL need to be distinguished. Otherwise, it is best to have a default NULL value. For numeric type, the default value is generally 0, or 0.0. This is different from NULL. For string type, NULL can be expressed as two types: ''and NULL. They are very different. Suppose you write a query with the col field being NULL: 1) If no NULL is found: select * from t where col = ''2) if there is NULL, select * from t where col is NULL or col = ''is considered a small case. When you join outer, you will suffer from the following: is col NULL because JOIN fails, or is col NULL because it has no value? Of course, this is a general principle, guideline. It depends on the application scenario. 4. The IP address is digitally stored in 15 bytes vs. 4 bytes (INET_ATON, INET_NTOA), saving the province and the province. MYSQL provides the ip-to-number and Digital-to-IP functions, so you can save it. 5. Static tables are faster (tables with a fixed length). Fields of the string type without VARCHAR, TEXT, or BLOB variable length are called static tables. Otherwise, it is a dynamic table. Disadvantage: a waste of space (the so-called static, that is, the size of the type, it will be allocated to you directly, no matter what you need .) Static tables should be rare now. When designing a table, it is the best if static separation is possible to meet the requirements. The advantage of a static table is that the query is fast, because when reading this record, it knows how far it jumps and can read the next record. Dynamic tables do not, because the length of any record is dynamic. 6. vertical Split advantages: reduce table complexity and number of fields (such as static and dynamic tables that can be separated) Disadvantages: excessive split leads to multiple joins. Lower performance depends on the business and data volume to balance the trade-offs. There is no absolute good or bad, and we have to adapt to local conditions. 7. Reduce the number of character sets, while reduce the IO for most of our businesses. GBK is enough. UTF8 is a waste. You think, GBK can only use 2 bytes for any word. Because UTF8 supports languages in many countries, it requires 2-4 bytes to store Chinese characters. Therefore, it is not necessary unless it is internationalized. 8. Moderate redundancy (space for time) the most direct objective of moderate redundancy is to reduce JOIN. Fields in another table that can be obtained through JOIN are stored again in the current table in the form of redundancy. Therefore, JOIN is not required. This is a common practice when the data volume reaches a certain level. 9. The statement that the primary key should not be too large (InnoDB) applies only to InnoDB. This involves the InnoDB data storage format. It stores data files in the form of a B-plus tree. That is, INNODB stores data files as index files. Therefore, each time you read a record, you must use the primary key to query the record. This is why an auto-increment primary key is required for table creation in InnoDB. Suppose the table T, field (id, owner, title, c1, c2, c3, c4 ). if you want to create an index on the owner field, two fields are stored in the corresponding index. One is the OWNER, and the other is the ID (primary key ), used to read the corresponding records in the data file. If you set the ID to BIGINT, it means that the 8 byte long field must be added for each index you create. Then, the index size is immediately increased. 10. The importance of growth ID (InnoDB) InnoDB data files are index files and primary key-based index files. Therefore, this is why a primary key is required. Even if you do not set it, it will silently Add a primary key to you. This is one of them. Second, is it growth? The reason for the increase is that when you add a new record, you only need to append it. If it is not increasing in order, when you insert a new record, first, it needs to find a suitable location, and then see if there is any space for it to be inserted. If there is no space for it, you have to move it back and place it in another place. If tens of millions of users do this, how slow is the insertion? If you don't want to talk about it, you can get it all over again. 11. CHAR vs. VARCHAR: Do not talk about encoding, because char or varchar will follow encoding. In fact, the length of a large segment value is clear and short (such as storing the md5 value). It is suitable for char (or binary in this case ). only the length of a value is short, that is, the length is long, and the length is short. You still need to use varchar. Char is the number of char you define, and it will directly allocate several of your char. the actual space used by varchar (n) is: n char + 1-2 byte. when it comes to space saving, this will indeed save a little bit. If there is only one VARCHAR in the entire table definition, char and varchar are basically different, unless you say that you define char (1 ). QUERY best practices: 1. do not use "SELECT *". Otherwise, it will read more, transmit more, and add unavoidable table scans. I will not talk nonsense. It is generally well known. 2. Do not like '% item %' but 'item % 'before %. This index cannot be used. Therefore, if you want to use indexes to speed up the query, do not add %. 3. Cardinaltiy (base) & selectiity (select ratio) Cardinality: the number of different values. For example, in table t, there are 100 records and the field owner has 100 values, but 10 of them are different. This is the Cardinality. Selectivity: 10/100 = 10% of this field. This concept is mainly used to determine whether this field is suitable for indexing. The larger the Cardinality, the higher the Selectivity field, and the more ideal the index creation object. Sometimes the database determines whether to use the index or scan the table based on this value. So people will use it if you have not created an index. In addition, indexes cannot be too many, but more indexes will slow down the update speed. 4. order by created the optimized time sorting of DESC is a common requirement in applications. I think, isn't this time growing by myself? Isn't it the same as ID auto-growth? Therefore, in order by, using a self-increasing primary key ID saves a file sort operation than using created. Faster. 5. count (1), count (*), count (owner) Difference count (1) is equivalent to count (*), equivalent to count (any not null field) count (owner): If the owner can be NULL, the number of these three conditions will be smaller. The smaller number is the number of those whose owner is null. 6. Don't join on different data types. Table A user_id serves as the foreign key of Table B, which is very common. Note that the user_id field type must be consistent in both tables. This saves unnecessary expenses, such as database type conversion for you. 7. Do not use full-text index. Currently, only MyISAM supports full-text index. In addition, it is not easy to use, and the user-defined nature is poor, so you just need to ignore it. If you really need full-text indexing, consider using Lucene, Solr, ElasticSearch, sphindexing, Groonga, and Xapian. All of them are experts, with complete functions and strong definability. 8. Limit n, m is slow, and most people turn pages with caution may rely on this. When the data volume is large, this is obviously very slow. It is recommended that, after the first query, remember the last ID of the current page. Then, when querying the next page, add this ID as a restriction and use limit pagesize. For example, if you think about it, you should be able to come up with some feasible strategies. In fact, when the data volume is large, you can change your perspective, such as continuing to write articles on limit n and m, or directly change the query method, such as using a search engine. 9. Multi-field indexing does not need to be said. The truth should be commonplace. Create index idx_col123 ON t (col1, col2, col3); rules: where col1 = ''and col2 ='' and col3 = ''where col1 ='' and col2 = ''where col1 ='' where col1 = ''where col1 ='' and col3 ='' (index used for col1, when col3 is used, a row is verified and filtered. You can see exactly what B Tree looks like. (Mysql should be a B + Tree, and the query logic is similar, with little difference) 10. Can a SELECT statement use multiple indexes? Yes. Merge optimization is available in a higher version of Mysql. This feature is supported. 11. Which of the following statements about JOIN vs. EXISTS is faster? 1) the final result is that the JOIN table size is related to one/one-to-one/two. 2) It should be clear that the EXIST has the advantage over JOIN in that first match returns, and JOIN is all match.3 that can match) the advantage of JOIN over EXIST is that you can select the execution sequence (join order) according to the actual situation. Before MySQL5.6, if the execution sequence of EXISTS in where is always from the outer channel, it seems to be smarter now. 4) When joining a large table in a small table, using EXISTS may be faster. A a join B B ON (. id = B. aid) WHERE. owner = 'aaa' and B. cat = 'bbb '; the execution sequence is as follows: (retrieve all records that meet the condition owner = 'aaa' in Table) JOIN = match the two for columns one by one (retrieve all records that meet the condition cat = 'bbb' in Table B) A a WHERE owner = 'aaa' and EXISTS (SELECT 1 from B where cat = 'bbb 'and. id = aid) execution sequence: for (retrieve all records that meet the condition owner = 'aaa' in Table) for (retrieve all records that meet the condition cat = 'bbb 'in Table B) check if. id = B. aid: If there is no index in the for clause, you must read all the data in table B. If an index exists, you only need to read a corresponding record. Based on the execution logic above, and the table size and association relationship, you can export which one is better, and test it several times to see what the execution plan is, and the general conclusion will be given.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.