"MySQL Database" chapter fourth Interpretation: Schema and data type optimization (Part I)

Source: Internet
Author: User
Tags bitwise sha1 sorts timestamp to date uuid
Preface:

The cornerstone of high performance: good logic, physical design, according to the system to execute the query statement design schema

This chapter focuses on MySQL database design, introduces the difference between MySQL database design and other relational database management system.

Schema: "Source"

The schema is a collection of database objects that contain objects such as tables, views, stored procedures, indexes, and so on. In order to distinguish different sets, it is necessary to give different names to each set, by default a user corresponds to a set , the user's schema name equals the user name, and is used as the default schema for the user. So the schema collection looks like a user name.

If you think of database as a warehouse, a lot of room (schema), a schema represents a room, table can be considered as a locker in each room, user is the owner of each schema, has the right to operate each room in the database, That is, each database mapping user has a key to each schema (room). SQL Server and Oracle MySQL are different

4.1 Selecting the Optimized data type

Principle:

1, smaller through better , as far as possible to use the smallest data type that can correctly store data (accounting for less disk memory CPU cache, processing requires less CPU cycles: faster), but can cover the data, it is embarrassing to save

2, Simple is good: Simple type (less CPU cycles), using MySQL built-in type storage time, integer storage IP, integer type is lower than the character cost (character set and collation collation make the character more complex)

3, try to avoid null: preferably specified as NOT null

*) NULL column uses more storage space, MySQL needs special handling

*) null makes indexes, index statistics, and value comparisons more complex; When a nullable column is indexed, additional bytes are required for each index record

Exception: InnoDB uses individual bit bits to store null,so for sparse data (many values are null) with good spatial efficiency, not suitable for MyISAM

4.1.1 Integer type "Reference"

Integer whole number

Tinyint (8-bit storage space) smallint (+) Mediumint (+) int (+) bigint (64)

1, storage value range: N is the number of bits of storage space

2, unsigned: optional, do not tolerate negative values, you can increase the upper limit of positive number by one times: tinyint unsigned 0~255,tinyint-128~127

3, have unsigned use the same storage space, the same performance

You can specify a width for an integral type, such as int (11), which is meaningless for most applications, does not limit the legal range of values, but specifies the number of characters the interactive tool displays, and is the same for storage and computation as int (1) and int (20);

Real number: with decimals

float and Double,mysql use duble as the type for internal floating-point calculations

Decimal: Store accurate decimals, MySQL server itself implementation, decimal (18,9) 18-bit, 9-bit decimal, 9 bytes (first 4 after 4 point 1)

Try to use only when the decimal is accurately calculated (additional space and computational overhead), such as financial data

When the amount of data is large, consider using bigint instead, multiplying the number of decimal digits that need to be stored in the currency units by the corresponding multiples

Floating point:

Recommendation: Only specify type, indefinite precision (MySQL), these precision non-standard, MySQL will choose the type, or save time on the value choice

When storing values of the same range, less space than decimal, float4 bytes Double8 bytes (higher precision range)

4.1.3 String Type

varchar and char:

Prerequisites: InnoDB and MyISAM engines, the most important type of string

Disk storage: Storage engine stored in the same way as in memory, disk, so the MySQL server from the engine to take the value of the need to format

varchar

1. Store variable strings, save space (using only the necessary space), but if the table uses row_format=fixed, the row will be stored in a fixed length

2, need to use 1/2 extra bytes to record string length, 1) column max length <=255 bytes, 1 bytes, no 2 bytes, 2) with the LATINL character set, varchar (10) column requires 11 bytes of storage space, varchar (1000) 1002 bytes, 2 bytes Storage length information

3, save storage space, for performance; But the update may make the line longer and extra work required

The appropriate situation:

1) The maximum length of the string column is much larger than the average length, 2) The column is less updated (not worrying about fragmentation), 3) uses a UTF-8 string, each character is stored with a different number of bytes

Char

1, fixed long, according to the length of space allocation, delete all the end of the space, insufficient length, space padding

2, more efficient storage space, char (1) to store only the value of Y N 1 bytes, varchar2 bytes, and a record length

Suitable situation:

1) suitable for storing very short strings; 2) or all values close to the same length; 3) frequently changed data, storage is not easy to fragment

corresponding spaces, storage:

Char type storage When the trailing space is deleted; How the data is stored depends on the storage engine, the memory engine only supports fixed-length rows (maximum allocation space)

binary,varbinary: Store binary string , byte code , not enough length, to get together (not a space) retrieval will not go

Generosity is not wise: varchar (5) and varchar (100) Store ' hell ' space overhead, long columns consume more memory

Blob and text: Big Data

are stored in binary and character mode, respectively, belonging to two different sets of data types: Character type: Tinytext, Smalltext, text, Mediumtext, longtext, corresponding binary type is Tinyblob, Smallblob, Blob, Mediumblob, Longblob, the only difference between the two categories: Blob type stores binary, no collation or character set, text has a string collation;

MySQL will each blob and text as a separate object processing , storage engine storage will do special processing, the value is too large, innodb use a dedicated external storage area for storage, at this time, each of the values in the row needs 1~4 bytes to store a pointer, The actual value is then stored externally;

MySQL sorts their columns: Sorts only the Max_sort_length bytes before each column, and cannot index the entire length of the column, or eliminate the sorting using those indexes;

If the explain execution plan extra contains a using temporary: This query uses an implicit temporary table

Use enum instead of string type

When defining a range of values, an enumeration of 1~255 members requires 1 bytes of storage, and 2 bytes is required for 256~65535 members. There can be up to 65,535 members, and the enum type can only select one from the members; similar to set

You can store non-repeating fixed strings as a predefined collection, and MySQL compresses the number of list values into 1/2 bytes when storing enumerations, and internally saves each value in the list as an integer (starting with 1, You must make a lookup to convert to a string, a cost , a list, and a "lookup table" in the table's. frm file that maintains the "number-string" mapping relationship;

Stores a number ENUM in one, the number is treated as an index value, and the stored value is the enumeration member corresponding to the index value: storing numbers in a ENUM string is unwise because it can disrupt thinking; ENUM Values are sorted according to the list order in the column specification. ( ENUM values are sorted according to their index number.) For example, for ENUM("a", "b") "a" a row "b" after, but for ENUM("b", "a") , but "b" "a" before. An empty string is queued before a non-empty string, and the NULL value is in front of all other enumeration values. To prevent unexpected results, it is recommended to define the ENUM list in alphabetical order . You can also use GROUP BY CONCAT(col) to determine the alphabetical order instead of the index value. Source

Sort when the table is created in the order in which the tables are sorted (should be); the worst part of the enumeration: The string list is fixed, the addition of the delete string is subject to alter TABLE, and the integer primary key is used to avoid the association of string-based values in ' lookup table ';

4.1.4 Date and time

DateTime: Large range of values 1001 9999 s YYYYMMDDHHMMSS independent of time zone 8 bytes

Default, display datetime:2008-01-02 in a sortable, unambiguous format 22:33:44

timestamp:1970 2038,1970 1 seconds since 1, time zone 4 bytes

From_unixtime Unix timestamp to date, Unix_timestamp date to UNIX timestamp

Insert does not specify the value of the first Timestamp column, set to the current time, the value of the first timestamp column is updated by default when inserting records, the timestamp class is not NULL, use timestamp as much as possible (High space efficiency);

You can use the bigint type to store subtle levels of timestamps, or double the number of seconds after a second, or use mariadb instead of MySQL;

4.1.5-bit

bit:mysql5.0

Pre and tinyint synonyms, new features

Bit (1) Single bit field, bit (2) 2 bit, maximum length 64 bits

behavior varies by storage engine , MyISAM packs a bit column that stores all (17 separate bit columns require only 17 bits of storage, myisam3 byte OK), and other engine memory and InnoDB store the smallest integer type sufficient for each bit column , does not save storage space;

MySQL uses bit as a string type , retrieves a bit (1) value, the result is a string containing a binary 0/1, a scene in a digital context is retrieved, a string is converted to a number, and most applications, best avoided;

Set

When you create a table, specify a range of values for the set Type: property name Set (' Value 1 ', ' Value 2 ', ' Value 3 ' ..., ' value n '), ' value n ' parameter represents the nth value in the list, the space at the end of these values is deleted directly by the system, and the field element order system automatically displays the repetition in the order defined Only one time is saved.

Its basic form is the same as the enum type. The value of a set type can take one element in a list or a combination of multiple elements . When multiple elements are taken, different elements are separated by commas. The value of a set type can only be a combination of 64 elements, depending on the member, the storage is also different: "Reference, same enum"

10%-80% members of a collection, accounting for 1 bytes. A collection of 9~16成, accounting for 2 bytes. A collection of 17~24成, accounting for 3 bytes. A collection of 25~32成, accounting for 4 bytes. A collection of 33~64成, accounting for 8 bytes.

You need to keep a lot of true, false values, consider merging these columns into the set type, represented within MySQL as a collection of packaged bits ( Efficient use of storage space ) and MySQL has find_in_set, field functions , easy to use in the query;

Cons: Changing the definition of a column is costly and requires ALTER TABLE, no longer set on-pass index lookup

Bitwise operations on integer columns:

Instead of set: use integers to wrap a series of bits: 8 bits can be wrapped into tinyint, and bitwise actions are used to simplify this work by defining name constants for BITS, but such query statements are difficult to write and difficult to understand

4.1.6 Select identifier identifier

Identity column: Self-growing column "source"

1) can not manually insert the value, the system provides default sequence value, 2) does not require and primary key collocation; 3) The requirement is a unique key;

4) A table at most one, 5) type can only be numeric, 5) can be set auto_increment_increment=3;

When you select an identity column type

Consider the type of storage, how MySQL performs calculations and comparisons on this type, and make sure that the same type is used in all associated tables and that the types are precisely matched;

Skills:

1, Integer type: integers are usually the best choice, quickly and can be used auto_increment

2. Enum and set types, storing fixed information

3, String: Avoid, less space than the number of slow, MyISAM watch special Caution (default for string compression use, query slow)

1) The new value generated by the completely "random" string Md5/sha1/uuid function is arbitrarily distributed in a large space, causing the insert and part of the Select to slow: Insert values are randomly written to different locations in the index, insert is slow (page split disk random access Clustered index fragmentation); Select slows, logically adjacent rows are distributed in different places on disk and memory; random values cause the cache to have a poorer effect on all types of query statements (which invalidates the access locality that the cache relies on)

Clustered Index , the actual stored sequential structure and the physical structure of the data storage is consistent, usually the physical order structure only one, the clustered index of a table can only have one, usually the default is the primary key, set the primary key, the system by default, you added a clustered index; "Source"

The physical order of the non-clustered index Records is not necessarily related to the logical order, and it has no relation with the storage physical structure of the data; a table corresponding to the nonclustered index can have more than one, according to the constraints of different columns can be set up different requirements of non-clustered index;

2) Store UUID, remove-symbol, or use Unhex to convert a number with a UUID value of 16 bytes, and stored in binary (16) column, retrieved by the HEX function format into 16 binary format;

The values generated by the UUID are different from the values generated by the cryptographic hash function (SHA1): The UUID is unevenly distributed, in a certain order, rather than incrementing an integer

Beware of automatically generated schemas:

Serious performance problems, very large varchar, association columns of different types;

ORM Stores any type of data into any type of back-end data store, and is not designed to use better type storage, sometimes using a separate row for each property of each object, setting the use of timestamp-based versioning, resulting in the existence of multiple versions of a single property; trade-offs

4.1.7 Special type data: null

Related articles:

"MySQL Database" chapter III Interpretation: Server performance profiling (top)

"MySQL Database" chapter III Interpretation: Server performance profiling (bottom)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.