MySQL notes the schema and data type optimization

Last Update:2016-01-20 Source: Internet

Author: User

Tags aliases arithmetic uuid

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Schema and data type optimization

1. Select the optimized data type

1). Smaller is usually better: smaller data types are usually faster because they consume less disk, memory, and CPU cache, and processing requires less CPU cycles.

2). Simple: Simple data type operations typically require less CPU cycles. For example, an integer is less expensive than a string operation because character sets and collation rules (collations) are more complex than the integer comparison of string comparisons. Here are two examples:

One is to use MySQL built-in types instead of strings to store the date and time, and the other is to use an integer storage IP address.

3). Avoid using null: it is generally preferable to specify column NOT NULL unless you really need to store null. Because null columns make indexes, index statistics, and value comparisons more complex. Nullable columns use more storage space, and special handling is required in MySQL.

2. mysql compatibility supports many aliases, such as Integer, bool, and so on. They are all aliases, and these aliases can be confusing, but do not affect performance. If you create a table with a strong alias for data and then check with show create TABLE, you will find MySQL

The base type is reported, not the alias.

3. Integer types: TINYINT, SMALLINT, Mediumint, INT, BIGINT, respectively, using 8,16,24,32,64 bit storage space. Their storage values range from n-1 power of 2 to n-1 of 2 to a power minus one.

1). The integer type has an optional unsigned attribute, which means that negative values are not allowed, which roughly increases the number of positive numbers to the line.

2). Signed and unsigned types use the same storage space and have the same performance, so you can choose the right type more realistically.

3). mysql can specify widths for positive types, such as int (11), but most applications do not make sense. For storage and Computation, int (1) and int (20) are the same.

4. Real type: Real numbers are digits with fractional parts. However, they are not just for storing fractional parts; You can also use decimal to store integers larger than bigint.

1). Float and double types support myopia calculations using standard floating-point arithmetic. The decimal type is used to store exact decimals.

2). Floating-point types typically use less space than decimal when they store values of the same type of range. Float uses 4 bytes of storage. Double takes 8 bytes and has a higher precision and wider range than float.

3). Because additional space and computational overhead are required, you should try to use decimal only when the decimal is calculated exactly. When the data volume is relatively large, consider using bigint instead of decimal to enlarge the corresponding value by N times.

5. String Type:

1). VARCHAR: It is more space-saving than a fixed-length type because it uses only the necessary space. VarChar saves space, so it also helps with performance. However, because the rows are side-length, it may be that the rows become longer than the original at update, which leads to additional work.

The following scenario is appropriate for use with varchar: The maximum length of the string is much larger than the average length, the column is not updated, so the fragmentation is not a problem; using a complex character set such as UTF-8, each character uses a different number of bytes.

In versions 5.0 or later, MySQL retains the trailing space when it is stored and retrieved. The InnoDB is more flexible, and it can store long varchar as blobs.

2). Char: fixed length, when storing a CHAR value, MySQL deletes all trailing spaces. Fixed-length char types are not prone to fragmentation, and for very short columns, char is more efficient than varchar in storage space, and Vachar has an extra byte for record length.

3). Remember that the length definition of a string is not the number of bytes, it is the number of characters. Multibyte character sets require more space to store individual characters.

4). Types similar to char and varchar also have binary and varbinary, which store binary strings. Byte codes and not characters are stored in binary strings.

The advantage of binary comparisons is not only in case sensitivity. MySQL compares binary strings by one byte at a time, and compares them based on the value of that byte. As a result, binary is much simpler than characters, so it's faster.

5). Blob and Text type: Both blob and text are string data types designed to store large data and are stored in binary and character mode, respectively. When the blob and text values are too large, InnoDB uses a dedicated "external" storage area for storage.

6. Date and Time type: The minimum time granularity that MySQL can store is seconds.

1). DATETIME: This type can hold a wide range of values, from 1001 to 9999, with a precision of seconds. It encapsulates the date and time into an integer formatted as YYYYMMDDHHMMSS. The default display format is "2008-02-16 22:37:08"

2). TIMESTAMP: Saves the number of seconds that are dependent on midnight January 1, 1970, which is the same as the Unix timestamp. Can only be represented from 1970 to 2038. Timestamp is more efficient because of its small footprint.

3). You can use the bigint type to store a microsecond-level timestamp.

7. Bit arithmetic: bit, SET

8. Select identifier (identifier, primary key)

1). When choosing the type of identity column, you need to consider not only the storage type, but also the type of MySQL that performs calculations and comparisons

2). Once you have selected a type, make sure that the same type is used in all associated tables. Using different data types can cause performance problems, even if there is no performance impact, implicit type conversions can also make it difficult to find errors when comparing operations.

3). You should choose the smallest data type if you can meet the range requirements for values and reserve future growth space.

4). Integer types are usually the best choice for identifying columns because they are fast and can be used auto_increment

5). If possible, avoid using string types as identity columns because they are very space-consuming and often slower than numeric types.

6). There is also a need to pay attention to a completely "random" string. For example: MD5 (), SHAI () or UUID () the resulting string. The new values generated by these functions are also arbitrarily distributed within a large space, which causes inserts and some SELECT statements to be slow

7). If you store UUID values, you should remove the "-" symbol, or, better yet, use the Unhex () function to convert a number with a UUID value of 16 bytes and store it in a binary (16) column. Retrieved and converted back.

9. Beware of automatically generated schema (table)

10. It is common to use varchar (15) to store IP addresses. However, they are actually 32-bit unsigned integers, not strings. Dividing a field into four segments with a decimal point is convenient for reading. Therefore, the IP address should be stored with an unsigned integer. MySQL offers Inet_aton ()

and the Inet_ntoa () function are converted between the two representation methods.

One of the pitfalls in MySQL schema design:

1). Too many columns

2). Too many associations

3). An almighty enumeration

4). Enumeration in Disguise

5). Null for non-invention: do not go to extremes because you do not apply null values, or you can use NULL as a practical matter

12. Advantages and disadvantages of the paradigm

Advantages:

1). Normalized update operations are usually faster than inverse normalization

2). Modify less data

3). Normalized tables are usually smaller and better placed in memory, so it's faster to perform operations

Disadvantage: It is often necessary to correlate queries, not only expensive, but also may invalidate some index policies

13. Advantages and disadvantages of anti-paradigm:

Pros: Good to avoid correlation, more efficient use of indexing strategies

Disadvantage: The advantage of paradigm is the disadvantage of anti-paradigm

14. Cache Tables and Summary tables:

1). Cache table: Stores tables that can be easily fetched from other tables in the schema (but with slow data acquisition)

2). Summary table: Saves a table that aggregates data by using the GROUP BY statement. Calculating statistics in real time is a very expensive operation.

4). When using cached tables and summary tables, you must decide whether to maintain the data in real time or rebuild it on a regular basis. Whichever is better depends on the application, but regular rebuilding does not just save resources, it can keep the table from being fragmented, and the index of the fully sequential organization.

15. Shadow table: Refers to a table created in the "back" of a real table. When the table operation is completed, the shadow table and the original table can be toggled through an atomic rename operation. Keep backups as much as possible to prevent problems with the new table.

16. Materialized views: Materialized views are actually pre-computed and stored on disk tables that can be refreshed and updated through a variety of policies. MySQL does not support materialized views and can be flexviews with open source tools.

Comparing traditional methods of maintaining summary tables and caching tables, flexviews can incrementally recalculate the contents of materialized views by lifting changes to the source table.

17. Counter table: Creating a stand-alone table storage counter is usually a good idea. Some tips for using counter tables

1). To prevent mutexes from affecting efficiency, you can add multiple records and randomly update a record. When statistics are total, all data is added.

2). You can also use the above method for updating counters based on time, but add one more step. The total number of days before each day is timed to be inserted into the counter table to delete those scattered statistical records.

18. Speed up the ALTER TABLE operation: The performance of the ALTER TABLE operation for MySQL is a problem for large tables. MySQL performs most of the modification of the table structure by creating an empty table with the new structure, identifying all the data from the old table to insert the new table, and then deleting the old table.

You can change the default value of a column by using the ALTER column operation. This statement modifies the. frm file directly and does not involve table data. So, this operation is very fast.

MySQL notes the schema and data type optimization

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More