Summary of Mysql database knowledge (based on the summary of the data) after graduation, I have been working on server development for more than three years, after graduation, I rarely want to spend some spare time at school to summarize my knowledge. now, I also want to summarize some of the key knowledge. today I am taking a lot of time, first, we will summarize the open source relational database mysql that has been used for more than three years. The knowledge points can be divided into the following points:
I. basic knowledge
II. SQL optimization and indexing
III. database specifications and suggestions
IV. database design
V. database architecture
I. basic knowledge
Knowledge points mainly include: data type common function character set transaction isolation level lock mechanism
(1) data type
Value types: TINYINT, SMALLINT, MEDIUMINT, INT, BIGINT, and DECIMAL
Time type -- DATE, DATETIME, TIMESTAMP
String type-CHAR, VARCHAR, TEXT
Special types-ENUM and SET
String case sensitive -- VARBINARY
Note:
* All variable-length strings use the VARCHAR type, similar to the state and finite-type fields. They also use strings that can clearly represent the actual meaning, instead of digits such as INT;
* A fixed-length string uses the CHAR type. all single characters use the CHAR type instead of the VARCHAR type;
* When the number of characters may exceed 20000, you can use the TEXT type to store character-class data. All TEXT fields must be split from the original table and separate with the original table primary key to form another table for storage;
* DATETIME or TIMESTAMP can be used for fields that need to be accurate to the time (year, month, day, hour, minute, and second). However, please note the range that can be expressed separately and whether the TIMESTAMP feature is required, use the TIMESTAMP type instead of DATETIME as much as possible to reduce the occupation of data storage space;
* You only need to use the DATE type for all fields accurate to the day, instead of TIMESTAMP or DATETIME;
* Fields of the auto-incrementing sequence can only use INT or BIGINT type, and must be clearly identified as UNSIGNED (keyword: UNSIGNED) unless a negative number does exist, the BIGINT type is used only when the value of this field exceeds 4.2 billion, and the auto-increment field must be part of the primary key or primary key;
(2) Common functions
Including: mathematical functions string functions date operation function type conversion functions
* Mathematical functions:
ABS (value): returns the absolute value of value. for example, ABS (-101). The return value is 101;
FLOOR (value): Remove the decimal number of value, for example, FLOOR (2013.8). The return value is 2013;
MOD (N, M): returns the remainder of N divided by M, for example, MOD (), and returns 3;
ROUND (value): returns the rounding value of the value, for example, FLOOR (2013.8). The return value is 2014;
ROUND (value, num): returns the rounded value of the num decimal value, for example, FLOOR (2013.867, 1). The return value is 2013.9;
RAND (): returns a random number;
* String functions
String connection function: MySQL string connection method. use the CONCAT () or CONCAT _ WS () function. Syntax: CONCAT (string1, string2 ,...) CONCAT _ WS (separator, string1, string2 ,..)
* String length statistics
LENGTH (string) # returns the number of bytes occupied by string.
CHAR_LENGTH (string) # returns the number of characters in string.
Counting the number of characters does not distinguish Chinese characters, letters, or numbers. it also has nothing to do with character sets. if the number of characters is counted, the types of characters are Chinese characters, letters, or numbers, and the character set.
Note:
All our MySQL databases will adopt UTF-8 encoding, so one Chinese character occupies three bytes; the letters or numbers used in the English input method occupy one byte.
* String functions
String judgment function
IF (exp1, exp2, exp3): IF exp1 is true, exp2 is returned; IF exp1 is false, exp3 is returned;
IFNULL (exp1, exp2): If exp1 is not null, exp1 IS returned; otherwise, exp2 IS returned;
NULLIF (exp1, exp2): If exp1 = exp2, NULL is returned; otherwise, exp1 is returned;
* String replacement function
LTRIM (exp1): removes spaces starting with a string in exp1
RTRIM (exp1): removes spaces at the end of the string in exp1
TRIM (exp1): removes spaces at the beginning and end of exp1.
TRIM (exp2, exp1): removes exp2 from the exp1 string.
* String lookup function
SUBSTRING_INDEX (exp1, delim, count): exp1 is a string, delim is a Segmentation symbol, and count represents the first few style symbols, for example: SUBSTRING_INDEX ('yto56 .com.cn ','. ', 1), return: yto56
SUBSTRING (exp1, pos, len): exp1 is the string, pos is the position, len is the length, for example: SUBSTRING ('yto56 .com.cn ',), return: yto56.com
LOCAL (substr, str): Locate the first position of substr in str, for example, LCOAL (yto56.com.cn ','. '). 6 is returned.
* Other string functions
UPPER (exp1): converts the string exp1 to uppercase;
LOWER (exp1): converts the string exp1 to lowercase;
* Date operation functions
Date format conversion functions:
Convert the string to the date type, DATE_FORMAT () or STR_TO_DATE (),
The format of the two functions is as follows:
DATE_FORMAT (expr1, format)
STR_TO_DATE (expr1, format)
Commonly used date format YYYY-MM-DD HH: MM: SS corresponds to format % Y-% m-% d % H: % I: % S
* Date calculation functions
CURDATE (): returns the date of the day, format: YYYY-MM-DD
NOW (): returns the current date and time in the format of YYYY-MM-DD HH: MM: SS
DATE_ADD (date, INTERVAL expr unit): expr expression, unit is the unit, for example: DATE_ADD ('2017-08-20 ', INTERVAL-1 DAY), return: 2013-08-19
* Date calculation functions
MONTY (date): returns the number of months of the date, for example, MONTY ('1970-08-20 '). return value: 8.
DAY (date): returns the number of days of the date, for example, DAY ('2017-08-20 '), and returns: 20
YEAR (date): returns the number of years of the date, for example, YEAR ('1970-08-20 '). The return value is 2013.
TO_DAYS (date): The number of days corresponding to the date, for example, TO_DAYS ('2017-10-07 '). The return value is 2008.
WEEK (date): returns the day of the WEEK corresponding to the date, for example, WEEK ('2017-08-20 '), and 2
DATEDIFF (date1, date2): returns the number of days difference for the date1-date2, for example: DATEDIFF ('2017-08-20 ', '2017-08-18 13:30:14'), returns: 2
* Type conversion functions
CAST (expr AS type): convert expr to a certain type
CONVERT (expr, type): CONVERT expr to a certain type
CONVERT (expr USING transcoding_name): CONVERT the expr character set to a character set
* Character set
For wireless application users, we recommend that you use the utf8 variant, that is, utf8mb4, which is stored in four bytes. In other cases, the character set UTF8 is used by default, and the character set proofreading rule utf8_general_ci is:
One Chinese character occupies 3 bytes
The storage space is 3 bytes for a symbol, character, and number in the Chinese input method.
The storage space is 1 byte for a symbol, character, and number in an English input method.
It is not recommended to set the validation rule utf8_bin, but the data type VARBINARY.
N interpretation in CHAR (N) or VARCHAR (N)
MySQL database defines the length of N, not the number of bytes, but the number of characters, for example: create tabel gl_user (username VARCHAR (40 )); the username can be up to 40 characters in length.
(3) transaction isolation level
Transactions have four features: Atomiocity, Consistency, Isolation, and Durability.
* An Atomic transaction is the logical unit of work of the database. all operations contained in the transaction are either performed, or, the result of no consistent transaction execution must be that the database changes from a consistent state to another consistent state.
* Isolation the execution of a transaction cannot be disturbed by other transactions. That is, the operations and data used within a transaction are isolated from other transactions, and the transactions executed concurrently do not interfere with each other.
* Once a transaction is successfully committed, modifications to data in the database are persistent. Other operations or faults should not affect the execution result.
(4) transaction isolation level
* The read uncommited select statement allows dirty reads. that is, the SELECT statement reads data modified by other transactions but not committed.
* Read commited select statements cannot be READ repeatedly, that is, the same query statement is executed twice in the same transaction. if the period between the first and second queries is, when the data queried by other transactions is modified and committed, the data read twice is inconsistent.
* Repeatable read select can be READ repeatedly, that is, the same query statement is executed twice in the same transaction, and the data obtained is always consistent.
* The only difference between SERIALIZABLE and repeatable is that the normal SELECT statement is changed to SELECT by default .... Lock in share mode. That is, it adds sharing to the data involved in the query statement, blocking other transactions from modifying real data.
(5) lock mechanism
In MySQL databases, there are currently two major transaction engines: NDB distributed storage engine and InnoDB storage engine.
NDB distributed storage engine
Only transaction isolation level is supported: read commited, row lock is provided
InnoDB storage engine
Four isolation levels are supported. the default isolation level is repeatable read.
The implementation of the lock is achieved through the index, and the UPDATE or DELETE without an index will be upgraded to the table lock.
Isolation level, that is, data visibility is achieved through MVCC
II. SQL optimization and indexing
Knowledge points mainly include: index principle index function optimized sequence subquery optimized page optimization left join incorrect writing index overwrite technical index creation criteria
(1) indexing principles
In fact, it is similar to a dictionary. the dictionary directory is equivalent to an index, and the specific content of a directory is similar to the records in the table;
(2) functions of indexes:
* Deduplication
* Accelerated search and positioning
* Avoid sorting
* Covering indexes
(3) Order of optimization
Business Optimization is preferred, followed by design and program optimization, and finally database and SQL optimization.
SQL, index, and database optimization are the processes that need to be constantly rebalance with data growth/data distribution
(4) subquery optimization
Original SQL statement
Select id from orders where orderid not in (select orderid from T_EXP_OP_RECORD_TAKING_SUCCESS WHERE AUX_OP_CODE IN ('new', 'update', 'delete') AND EXP_TYPE = 10 AND OP_CODE IN (310,311 )) LIMIT 10000;
The optimized SQL statement SELECT ID FROM T_EXP_OP_RECORD_TAKING_SUCCESS WHERE AUX_OP_CODE NOT IN ('new', 'update', 'delete') OR EXP_TYPE! = 10 OR OP_CODE not in (310,311) LIMIT 10000;
To sum up and reduce the number of full table scans and reduce the number of data reads back to the table by 10000, only the table is scanned. if the number of records meeting the conditions reaches 10000, scanning is stopped.
(5) incorrect left join statement
Correct syntax: select m. columnname ......, N. * columnname ..... FROM left_table m left join right_table n on m. columnname_join = N. columnname_join and n. columnname = xxx where m. columnname = xxx and m ......
Interpretation: M. columnname_join = N. columnname_join in the ON clause join condition is used to connect the equivalent records in two tables, but does not affect the number of record sets. If a record in the left_table table cannot be found in the right_table table, the record is still displayed in the result set, the values of the right_table columns displayed in the query result set must be replaced by NULL;
In the ON clause connection condition, the table right_table. columnname = XXX is used to control whether the right_table table has column values that meet the requirements or whether it is displayed in the query column with NULL replacement, without affecting the number of record sets;
The WHERE clause controls whether the record meets the query requirements. if the record does not meet the query requirements, it is filtered out. one of the common mistakes is to select m. columnname ......, N. * columnname ..... FROM left_table m left join right_table n on m. columnname_join = N. columnname_join and n. columnname = xxx and m. columnname = XXX;
(6) index coverage Technology
The WHERE condition of the SQL statement covered by the index is part of the index. the ORDER or GROUP field of the SQL statement is the value returned by a part of the index or the metadata column is calculated as part of the index. after overwriting, after all operations are completed using indexes and covered by indexes, you do not need to access the metadata example idx_uid_gmtcreate (uid, gmt_create) SELECT uid, gmt_create FROM tablename WHERE uid> 100;
(7) index creation principles
The number of indexes in a single table cannot exceed 5
The number of fields in a single index cannot exceed 5
Use prefix indexes for strings. the prefix index length cannot exceed 10 characters.
The maintenance of index structures and statistical information requires high costs, and the index reuse rate should be improved as much as possible.
Filtering rate of each field in the composite index
The ordered information of composite index fields in all SQL statements, which mainly refers to clauses such as WHERE, GROUP BY, and ORDER.
The order of each field in the composite index is extremely important.
III. database specifications and suggestions
Knowledge points include: naming rules, field definition specifications, application account permission SELECT * usage specifications, adding function usage specifications to fields, paging query specifications, data type conversion specifications, DML statements available for indexing, product R & D and launch process suggestions
(1) naming rules
* Index name:
MySQL index names are case-sensitive, which may cause performance problems when the optimizer selects indexes. The index naming rules are full uppercase or all lowercase letters. The index names start with idx _ or IDX, the end is _ u (or _ U) or _ n (or _ N), where: _ U or _ u indicates a unique index, and _ n or _ N indicates a normal index, the default index name is in lowercase.
* Object naming case:
Improve the efficiency of developers, reduce R & D costs, and avoid program failures. database objects: database name, table name, stored procedure name, custom function name, trigger name, View name, and task scheduling name are all case-insensitive, but we recommend that you use lower case letters
We recommend that you use lower-case letters for database name, table name, and field name, and use underscores to separate them.
The database name, table name, and field name cannot exceed 40 characters. Name and description
The database name, table name, and field name cannot use the MySQL reserved word temporary database. The table name must be prefixed with tmp and suffixed with a date.
The backup database and table must be prefixed with bak and suffixed with a date.
* Field definition specifications
The field property is forcibly defined as not null.
The default value of the field attribute of the string type is space, that is''
Numeric field property default value is 0 time type field property default value is '2017-01-01 08:00:01'
If it is a field used by the date of birth, the default value is space, that is, the data constraint of the ''field is not only unique, but other constraints are guaranteed by the application.
(2) Application account permissions
Normal permissions in order to reduce data security risks, only four types of Database permissions are granted to the application account: SELECT, INSERT, UPDATE, and DELETE for read-only applications, the read-only account permission should be granted to SELECT;
Special permissions given that we have EVENT and PROCEDURE, we may need to grant relevant permissions. the granted permissions are: EXECUTE;
(3) usage specification of SELECT *
Basic principle: when "*" is used in an unnecessary query to list all fields and group by or order by exists, it is forbidden to use SELECT * to retrieve all fields at a time. For JOIN statements connected to tables, SELECT * is not allowed for query. If you do not need to retrieve the text field from a table that contains the TEXT field, you cannot use SELECT * for query.
Detailed description:
1> SELECT * is not allowed for group by or order by operations to ensure that MySQL can use the latest optimized sorting algorithm.
2> The JOIN statement does not allow the use of SELECT *. it is used to prevent the number of tables to be retrieved for queries that can be completed only by indexing.
3> If you do not need to retrieve the TEXT field from the TEXT field table, SELECT * is not allowed because the TEXT field is stored in a physical location different from the normal record, will cause a large number of io operations
4> SELECT * increases CPU, IO overhead, and occupies more network bandwidth, and also affects performance.
5> to avoid program bugs caused by adding or deleting fields without modifying related SQL and program code, disable SELECT *
(4) Add function usage specifications to fields
Basic principle: do not use any function to convert the type or format of a filter field that appears in the WHERE condition; the correct method is to convert the passed-in comparison value to the column type.
Incorrect syntax: SELECT uid, gmt_create FROM gl_user where upper (username) = UPPER ('yto56. com ');
Correct syntax: SELECT uid, gmt_create FROM gl_user WHERE username = UPPER ('yto56. com ');
(5) paging query specifications
Basic principle: all paging query statements must carry sorting conditions, unless explicitly required by the business side, do not use any sorting to randomly display data. Detailed description:
1> general paging statement writing (start: number of starting records, page_offset: number of records per page): select id, username FROM gl_user WHERE username like '% @ 163.com' order by m. gmt_create LIMIT start, page_offset;
2> multi-table Join paging statement. if the filtering condition is on a single table, you need to paging the table first, and then JOIN the statement with low performance: select m. username, P. rolename FROM gl_user m inner join gl_role p on m. ID = P. userid WHERE username like '% @ 163.com' order by m. gmt_create LIMIT start, page_offset; high-performance statement: select m. username, P. rolename FROM (select id, username FROM gl_user WHERE username like '% @ 163.com' order by m. gmt_create LIMIT start, page_offset) M, gl_role p where m. ID = P. userid;
In this way, the premise is that the records in the associated tables are matched one by one. Otherwise, the number of records returned may be less than or more than the page_offset value.
(6) data type conversion specification
Basic principle: in all Query Where conditions, the data type exactly the same as that of the filter field must be used to prevent any implicit type conversion, avoid Query execution plan errors due to data type mismatch, resulting in performance problems.
Detailed description:
1> type conversion is not allowed for all Where condition fields. if you need to convert the type, you can only convert the filter value instead of the conversion field.
2> The most common implicit type conversion is common between the time type and the string type. we recommend that all time type fields be passed in as the time type in iBatis, you can also use a string to pass in and then use the time function to convert the string to a valid time format, as shown in the following figure: SELECT * FROM member WHERE gmt_create = DATE_FORMATE ('2017: 02: 03 ', '% Y-% m-% d % H: % I: % S ');
3> in table connection Query, if the data types at both ends of the connection condition are inconsistent, you must ensure that the connection condition data type of the driver table is converted to the data type consistent with that of the driver table.
(7) DML statements have indexes available
* DML statements refer to INSERT, UPDATE, and DELETE statements. This document describes the last two operations.
(2) The InnoDB storage engine locks through indexes, and no index will lead to table locks being upgraded.
(3) upgrade DML to table-level locks and apply exclusive locks, which will block other services in the same table.
(8) product R & D and launch process suggestions
IV. database design
Knowledge points include: field type storage engine primary key selection primary and secondary table design big field separation cold and hot data separation
(1) field type
The field value should be digitalized as much as possible, and the data type should be integer as much as possible
The length of the storage space of the field. try to use a short and compliant data type.
Use TIMESTAMP as much as possible for the time type, except for DATETIME for the date of birth
Try not to use data types: TEXT and BLOB
Try to use VARCHAR instead of CHAR for string type
Replace FLOAT and DOUBLE types with the DECIMAL type
(2) storage engine
* InnoDB storage engine:
Supports transaction support for row locks and table locks. strong data security. BP supports caching metadata and index data simultaneously. MVCC supports B + Trees. only the adaptive index HASH supports full-text retrieval;
* MyISAM storage engine:
KB only caches index data and only supports table locks. it does not support transactions and supports full-text retrieval;
* MEMORY storage engine:
Transactions are not supported in the cache memory of metadata and indexes. only table locks are supported. data persistence is not supported. B + tree and HASH indexes are supported;
* Sphinx supports full-text indexing:
Supports Chinese word segmentation and supports the creation of full and incremental indexes. supports distributed deployment;
(3) primary key selection
(4) primary and secondary table design
The acceptance data of the front-end database is used as an example.
T_exp_op_record_sgnature
T_exp_op_record_sgnature_deal
T_exp_op_record_sgnature_fail
T_exp_op_record_sgnature_success
T_exp_op_record_sgnature_success_02
T_exp_op_record_sgnature_success_03
T_exp_op_record_sgnature_unsuccess
T_exp_op_record_sgnature_success1
T_exp_op_record_sgnature_update
The maximum number of summary fields is 56, and the minimum value is 54. the number of identical fields is 54. The value of each table is the same. 52 identical field values exist in one table, the secondary table records eight types of expanded data.
(5) large field separation
The length of a VARCHAR field is smaller than 255 bytes and 1 byte is marked.
The length of a VARCHAR field exceeds 255 bytes and is marked as 2 bytes.
TEXT field, marked as 1-4 bytes, 8 bytes as pointer
The value of a TEXT field is stored in an invisible column.
Summary: large fields are generally stored in a single table, and VARCHAR is used as an alternative to TEXT.
(6) cold/hot data separation
Suitable for cold/hot separation scenarios:
Table data fields have a large number of fields that are infrequently accessed and partially frequently accessed.
The data fields in the table have frequently changed fields and a large number of fields that are rarely modified.
The data fields in the table are displayed at a low volume or at a low frequency.
How to separate: Store frequently accessed and infrequently accessed data in two tables and associate them with the primary key ID of a table.
Store frequently modified and rarely modified fields in different tables
Improves data access speed and data modification speed, and reduces IO operations
Note: the cold/hot separation scenario must be based on business characteristics and actual data operations
V. database architecture
Knowledge points include: Dual-master replication architecture, one-master-Multiple-slave architecture, distributed database, pseudo-distributed database
(1) dual-master replication architecture
(2) one-master multi-slave architecture
(3) distributed database
(4) pseudo-distributed database architecture