Select the MYSQL column type
When creating a table, how does one determine which types to use? This section describes the factors that should be taken into account before making a decision. The most "common" column type is the string type. Any data can be stored as a string because both the number and date can be expressed as strings. But why not define all columns as strings to end the discussion here? Let's look at a simple example. Suppose there are some values that look like numbers. They can be represented as strings, but should they be done like this? What will happen in this way?
Www.2cto.com
One thing is inevitable, that is, more space may be used, because data storage is more effective than strings. We may have noticed that the query results vary depending on the number and string processing methods. For example, the number sorting is different from the string sorting. Count 2
The number is less than 11, but the string "2" is more than "11" in alphabetical order ". You can use the column of the following values to find out the problem:
Www.2cto.com
Add zero to the power to produce a value, but is it reasonable? It may be unreasonable. Using this column as a number rather than a string has several important meanings. It converts the values of each column to numbers, which is inefficient. In addition, converting the value of this column to the calculation result prevents MySQL from using the index of this column, thus reducing the query speed in the future. If these values are stored as numerical values at the beginning, these performance reductions will not occur. Using one representation instead of another is actually not simple. It will have an important impact on storage requirements, query efficiency, and processing performance.
The preceding example shows that when selecting a column type, you need to consider the following issues:
What types of values are stored in the column? This is an obvious problem, but it must be confirmed. Values of any type can be represented as strings, especially when a value of a more appropriate type is used for better performance (the same is true for date and time values ). It can be seen that evaluating the type of value to be processed is not necessarily trivial, especially when the data is others' data. If you are creating a table for others, it is extremely important to find out the type of values to be stored in the column. You must raise enough questions to obtain sufficient information for decision-making.
Does the column value have a specific value range? If they are integers, are they always non-negative values? In this case, the UNSIGNED type can be used. If they are strings, can they always be selected from the fixed-length value set? In this case, ENUM or SET is a suitable type. There is a compromise between the value range of the type and the storage used. Is there a "big" type? If the value range of a number is limited, you can select a smaller type. If the value range is almost infinite, you should select a larger type. Strings can be short or long, But CHAR (255) should not be used if you want the stored value to contain less than 10 characters ).
What are the performance and efficiency problems? Some types are more efficient than others. Numeric operations are generally faster than string operations. Short strings run faster than long strings, and consume less disk. The fixed length type has better performance than the variable length type.
What kind of comparison do you want? Strings can be case-sensitive or case-insensitive. The selection also affects sorting because it is based on comparison.
Are you planning to index columns? If you plan to index a column, your selection of the column type will be affected because some MySQL versions do not allow indexing of Certain types, such as BLOB and TEXT. In addition, some MySQL versions require that the index column be defined as not null, So that you cannot use the NULL value.
Now let's look at these issues in more detail. It should be pointed out that when creating a table, you want to make the best choice of the column type as possible, but if the selection is not the best, this will not cause much problems. You can use alter table to convert the selected type to a better type. When we find that the values in the data are larger than the original values, we can simply replace the SMALLINT with MEDIUMINT. Sometimes such replacement may be complicated, for example, changing the CHAR type to the ENUM type with a specific value set. In MySQL 3.23 and later versions, you can use procedure analyse () to obtain table column information, such as the minimum and maximum values, and the best type of recommended value range for overwriting column values. This helps determine the use of smaller types, improve the performance of queries involving the table, and reduce the amount of space required to store the table.
2.3.1 what type of value is stored in the column
When determining the column type, we should first consider the value type of the column, because this is the most obvious significance for the selected type. Generally, you can store numbers in numeric columns, strings in string columns, and dates and times in date and time columns. If the numeric value has a decimal part, the floating-point column type should be used instead of the integer type. Sometimes there are exceptions and cannot be generalized. We should understand the features of the data used to select a type in a meaningful way. If you plan to store your own data, you may have a good idea about how to store them. However, it is sometimes difficult to determine the column type if you ask others to create a table for them. This is not as easy as processing your own data. You should fully ask questions to find out what type of value the table actually contains.
If someone tells you that a column needs to record "rainfall ". Is that a number? Or is it a "Main" value, that is, generally but not always encoded as a number? For example, when watching TV news, weather forecasts generally include rainfall. Sometimes it is a number (such as "0.25" inches of rainfall), but sometimes it is "trace" rainfall, meaning "The rain is not big at all ". This is suitable for weather forecasts, but how to store it in databases? It may be necessary to quantify "trace" into a number so that rainfall can be recorded using the numerical column type, or use strings to record the word "trace. You can also propose a more complex arrangement by using a numeric column and a string column. If one column is filled, the other column is NULL. Obviously, this option should be avoided if possible. In the end, this option makes the table hard to understand and makes the query more difficult. We generally try to store all rows as numerical values and convert them only for display purposes. For example, if a non-zero rainfall of less than 0.01 inch is considered as a trace amount, you can select the following column values:
For the calculation of money, you need to process the elements and the parts. This seems to be like a floating point value, but FLOAT and DOUBLE are prone to rounding errors. These types may not be suitable except for records that require roughly accurate accuracy. Because people are sensitive to their own money, it is best to use a type that can provide improved accuracy, such:
Represent money as DECIMAL (M, 2) type, and select M as the maximum width suitable for the required value range. This gives a floating point value with two decimal places. The advantage of DECIMAL is that it represents a value as a string and is not prone to rounding errors. The disadvantage is that string operations are less efficient than the number of Internally stored values.
An integer can be used internally to represent all the money values. The advantage is that the internal integer is used for calculation, which can be very fast. The disadvantage is that the multiplication or division of 100 pairs is required for conversion during input or output. Some data is clearly numeric, but it must be determined whether to use the floating point type or the integer type. We should clarify what the unit is and what precision is needed. Is the precision of the entire unit sufficient? Or do I need to represent the decimal unit? This helps you distinguish between integer and floating-point columns. For example, if you are indicating the weight, you can use an integer column if the record value is pound. If you want to record the fractional part, you should use a floating point column. In some cases, multiple fields are even used. For example, if you want to record weights based on lbs and ounces, you can use multiple columns.
Height is another numeric type, which can be expressed as follows:
A string such as "6 feet 2 inch" can be expressed as "6-2. This form has the advantages of easy observation and understanding (of course, compared to "74
"), But this value is difficult to use in mathematical operations, such as sum or average.
One Value Field indicates feet, and the other value field indicates inches. This indicates that it is relatively easy to perform numerical operations, but it is difficult to use two fields than a field.
Use only one numerical segment that represents an inch. This is the easiest way for databases to handle, but this method is least meaningful. However, remember that you do not have to use the format that you normally use to represent the value. MySQL functions can be used to convert values to seemingly significant values. Therefore, this representation method may be the best way to represent the height.
If you want to store date information, do you need to include the time? That is, do they always need to include time? MySQL does not provide the DATE type with the optional time part: DATE does not include time, And DATETIME must contain time. If TIME is indeed optional, a DATE column can be used to record the DATE and a TIME column to record the TIME. The allowed TIME column is NULL and is interpreted as "no TIME ":
It is particularly important to determine whether a time value is required when two tables are connected using a date-based master-Detail relationship. Assume that you are conducting a study that includes questions for people entering your office to test. After a standard preliminary test set, you may perform several additional tests on the same day, depending on the initial test results. You may use a master-Detail relationship to indicate the information. The question identification information and standard preliminary test are stored in a master record, other tests Save the rows as the secondary detail table. Then, the two tables are connected Based on the question ID and the test date.
In this case, you must answer the question: can you use only the date, or use both the date and time. This issue depends on whether a question can be tested more than once on the same day. If so, you should record the TIME (for example, the TIME when the test process started), or use the DATETIME column, or use the DATE and TIME columns respectively (both must be filled in ). If a question is tested twice a day, the detailed records of the question cannot be associated with an appropriate primary record without a time value.
I have heard people say, "I don't need time; I never test a question twice on the same day ". Sometimes they are right, but I have also seen these people record the data of Multiple tests on the same day and then consider how to avoid confusion between the details and wrong primary records. Sorry, it's too late! Sometimes the TIME column can be added to the table to solve this problem. Unfortunately, it is difficult to sort out existing records unless there are some independent data sources, such as original written records. In addition, there is no way to eliminate the ambiguity of the detailed records so that they can be associated with the appropriate primary records. Even if there is an independent source of information, this is also very messy, it is likely that the application that has been written to use the table has a problem. It is best to explain the problem to the table owner and ensure that the problem is well described before they are created.
Sometimes it has incomplete data, which interferes with the selection of the column type. If a family tree study was conducted, the date of birth and the date of death were to be recorded, and sometimes the data that could be collected was only the year when someone was born or died, but there was no exact date. If the DATE column is used, the DATE cannot be entered unless there is a complete DATE value. If you want to record any information, even if it is incomplete, you may have to save an independent year, month, or day field. In this way, you can enter a date Member and set the part that does not exist to NULL. In MySQL 3.23 and later versions, the DATE value can also be 0 on a day, or 0 on a month or day. In this way, the "fuzzy" date can be used to indicate incomplete date values.
2.3.2 does the column value have a specific value range?
If you have decided to select a column type from a common category, the value range of the value you want to represent will help reduce your selection to a specific type in the category. Assume that you want to store the integer. If the value range of these integer values is 0 to 1000, all types from SMALLINT to BIGINT can be used. If the value range of these integer values is a maximum of 2 000 000, SMALLINT cannot be used, and its selection range is from MEDIUMINT to BIGINT. Select a type from the possible selection range. Of course, you can simply select the maximum type for the value you want to store (for example, select BIGINT in the above example ). However, you should select the minimum type that is sufficient to store the value you want. In this way, you can minimize the amount of storage used by the table to achieve the best performance, because the processing of small columns is usually faster than that of large columns. If you do not know the value range of the value to be expressed, you must guess or use BIGINT to cope with the worst case. (Note: If a small type is used for making a guess, the work will not be done in white; you can use alter table to change this column to a larger type in the future .)
In Chapter 1st, we created a score table for the credit retention plan, which has a score column that records quizzes and test credits. For the sake of simplicity, INT type is used to create the table, but now we can see that if the credits are within the range of 0 to 100, tinyint unsigned is a better choice, because the storage space is small. The value range of data also affects column type attributes. If the data is never negative, you can use the UNSIGNED attribute; otherwise, you cannot use it.
The string type does not have the "value range" as the value column, but they have length. You need to know the maximum length of the columns available for this string. If the string is less than 2 to 56 characters, you can use CHAR, VARCHAR, TINYTEXT, TINYBLOB, and other types. If you want a longer string, you can use the TEXT or BLOB type, and CHAR and VARCHAR are no longer options. For string columns used to represent a SET of fixed values, you can consider using the ENUM or SET column type. They may be good options because they are represented by numbers internally. These two types of operations are numerical, so they are more efficient than other string types. They are also more compact and save space than other string types. When describing the range of values that must be processed, the best terms are "always" and "never" (for example, "always less than 1000" or "Never negative "), because they can more accurately constrain the selection of column types. However, use these two terms with caution before they are confirmed. Especially when talking to others about their data, they should pay attention to these two terms. When someone says "always" or "never", they must find out what they mean. Sometimes people say that their data always has a specific nature, and the true meaning is "almost always ".
For example, if you design a table for some people and they tell you, "our test credits are always from 0 to 100 ". According to this description, you have selected the TINYINT type and set it to UNSIGNED because the value is always non-negative. However, you find that the encoding is sometimes used to indicate "the student is absent due to illness ". They didn't tell you about it. It may be possible to use NULL to represent-1, but if not, you must record-
1, so you cannot use the UNSIGNED column (You have to use alter table to remedy it !). Sometimes the discussion about these situations can be simplified by asking some simple questions. For example, have you ever made any exceptions? If there have been exceptions, you must consider it even if it is only once. You will find that people who discuss database design with you always think that if exceptions do not happen frequently, it doesn't matter. However, this is not the case when you create a database. The question to be asked is not how frequently exceptions occur, but is there an exception? If yes, you must consider it.
2.3.3 performance and efficiency issues
The selection of column types affects query performance in several aspects. If you remember the general principles discussed in the following sections, you can select the column types that help MySQL effectively process the table.
1. Calculation of values and strings
Numeric operations are generally faster than string operations. For example, you can compare the logarithm of a single operation. The string operation involves several byte-by-byte comparisons. If the string is longer, there will be more such comparisons. If the number of values in a string column is limited, use the ENUM or SET type to obtain the superiority of numerical calculation. These two types are represented by numbers internally and can be processed more effectively. For example, replace the expression of a string. Sometimes the number is used to represent the string value to improve its performance. For example, to use dotted-quad to represent an IP number, for example, 192.168.0.4, you can use a string. However, you can also use each byte of the Four-byte UNSIGNED type to store each part of the four-digit number and convert the IP number to an integer. This can save space and speed up searching. However, if the IP number is expressed as an INT value, it is difficult to complete pattern matching such as searching for a subnet number. Therefore, you cannot only consider the space issue. You must determine which representation is more suitable based on what values are used.
2. Smaller types and larger types
Smaller types are much faster than larger types. First, they occupy less space and consume less disk activity. The processing time of a string is directly related to the length of the string. In general, smaller tables process faster because query processing requires less disk I/O. For columns of fixed-length type, the minimum type should be selected, as long as the values of the required range can be stored. For example, if MEDIUMINT is enough, do not select BIGINT. If you only need FLOAT precision, you should not select DOUBLE. The variable length type can still save space. A blob type value records the length of the value in 2 bytes, while a LONGBLOB records the length of the value in 4 bytes. If the length of the stored value never exceeds 64 KB, using BLOB will save each value 2 bytes (of course, you can also consider the TEXT type ).
3. Fixed Length and variable length type
The fixed length type is generally faster than the variable length type:
For variable-length columns, because the record size is different, many deletion and modification operations on them will cause more fragments in the table. Optimize table needs to be run regularly to maintain performance. The Fixed Length Column does not have this problem.
When a table crashes, tables with fixed-length columns are easy to reconstruct because the starting position of each record is determined. Variable long columns are not convenient. This is not a performance issue related to query processing, but it will certainly speed up the table repair process. If the table contains variable-length columns, converting them into fixed-length columns improves performance because fixed-length records are easy to process. Before trying to do this, consider the following:
Using a fixed-length column involves some compromise. They are faster, but occupy more space. Each value of a CHAR (n) type column always occupies n Bytes (even if it is an empty string), because when stored in the table, the length of the value is insufficient and spaces will be filled on the right. VARCHAR (N) columns occupy less space, because only the space required to store each value is allocated, and each value is added with one byte to record its length. Therefore, if you select between CHAR and VARCHAR columns, you need to make a compromise between time and space. If speed is a major concern, CHAR
Column to obtain the performance advantage of a fixed-length column. If space is critical, use the VARCHAR column.
Only one variable-length Column cannot be converted; all columns must be converted. In addition, you must use an alte rtable statement to convert all data at the same time. Otherwise, the conversion will not work.
Sometimes the fixed length type cannot be used, even if you want to do so. For example, for a string longer than 255 characters, there is no fixed length type.
4. Index type
The index can accelerate the query speed. Therefore, you should select an index type.
5. NULL and NOT NULL
If a column is defined as not null, the processing is faster, because MySQL does NOT have to check the value of this column in Query Processing to determine whether it is NULL, and each row in the table can save one bit. Avoiding NULL in a column makes the query easier, because NULL is not considered as a special case. Generally, the simpler the query, the faster the processing. The performance criteria are sometimes conflicting. For example, MySQL can locate rows. fixed-length rows containing CHAR columns are faster than variable-length rows containing VARCHAR columns. But on the other hand, it will also occupy more space, so it will lead to more disk activity. From this point of view, VARCHAR may be faster. As an empirical rule, it can be assumed that a fixed long column can improve performance, even if it occupies more space. For a special key application, you may want to implement a table in two ways: Fixed Length and variable length, and perform some tests to determine which method is faster for your specific application.
2.3.4 comparison of Values
The string type can be compared and sorted in case-sensitive or case-insensitive mode based on the way of defining strings. Table 2-14 shows each case-insensitive type and its equivalent case-insensitive type. The keyword BINARY is not given in the column definition. Some types (CHAR and VARCHAR) are BINARY or non-BINARY. The "binary" of other types (BLOB and TEXT) is hidden in the type name.
Note that the binary (case sensitive) type is only different from the non-binary (Case Insensitive) type in comparison and sorting. Any string type can contain any type of data. In particular, although the TEXT type is called "TEXT" in the column type name, it can store binary data well. If you want to use a column that is both case-sensitive and case-insensitive during comparison. You can use the BINARY keyword to force the string as the BINARY string value when you want to compare the case sensitivity. For example, if my_col is a CHAR column, you can compare it in different ways:
My_col = "a B C" is case insensitive
BINARY my_col = "a B C" case sensitive
My_col = BINARY "a B C" case sensitive
If you have a string value that you want to store in non-dictionary order, consider using the ENUM column. ENUM values are sorted according to the order of enumerated values listed in the column definition. Therefore, these values can be sorted in any desired order.
2.3.5 are you planning to index columns?
You can use indexes to process queries more effectively. The selection of indexes is a topic in Chapter 4th, but the general principle is to use the columns used to select rows in the WHERE clause for the index. If you want to index a column or include it in multiple index columns, the selection of the type may be limited. In MySQL releases earlier than version 3.23.2, the index column must be defined as not null and the BLOB or TEXT type cannot be indexed. These restrictions are removed in MySQL 3.23.2, but if you are using an earlier version and cannot or do not want to upgrade, you must follow these constraints. However, you can bypass them in the following situations:
If you can specify a value as a dedicated value, you can treat it as something identical to NULL. For the DATE column, you can specify "Expiration -00-00" to indicate "no DATE ". In the string column, you can specify an empty string to indicate a "missing value ". In a value column, if this column generally only stores non-negative values, you can use-1.
The BLOB or TEXT type cannot be indexed, but if the string cannot exceed 255, you can use the equivalent VARCHAR column type and index it. VARCHAR (255) BINARY can be used for BLOB values, and VARCHAR (255) is used for TEXT values.
2.3.6 degree of association of column type selection problems
Do not consider that the selection of column types is independent of each other. For example, the value range of a value is related to the storage size. When the value range is increased, more storage space is required, which affects the performance. In addition, what is the meaning of choosing AUTO_INCREMENT to create a column with a unique serial number. This selection has several results, which involve the column type, index, and NULL usage, and are listed as follows:
AUTO_INCREMENT is a column attribute that should only be used for integer type. It limits your selection to TINYINT to BIGINT.
The AUTO_INCREMENT column should be indexed, so that the current maximum serial number can be determined quickly without all scanning the table. In addition, to prevent serial numbers from being reused, index numbers must be unique. This indicates that the column must be defined as the primary key or the UNIQUE index.
If the MySQL version is earlier than 3.2 3.2, the index Column cannot contain NULL values. Therefore, the column must be defined as not null. All of this means that only one AUTO_INCREMENT Column cannot be defined as follows:
Another result obtained by using AUTO_INCREMENT is that because it is used to generate a positive sequence, it is best to define the AUTO_INCREMENT column as UNSIGNED: