[Mysql] database usage specifications, mysql database specifications
I. MySQL Problems
The optimizer does not support complex SQL statements.
Poor Support for SQL standards
The large-scale cluster solution is not mature, mainly refers to Middleware
ID generator, global auto-increment ID
Asynchronous logical replication, data security issues
Online DDL
Incomplete HA solution
The backup and recovery solutions are still complex and dependent on external components.
Too little information is displayed to users, making troubleshooting difficult
Numerous Branches, making it hard to choose
Ii. database environment Introduction
Generally, Internet companies' databases are divided into five database environments:
Dev: Development environment, development can read and write, can modify the table structure; commonly used 163 of database tables; developers can modify the table structure, can modify the data in it; but need to ensure that it does not affect other development colleagues
Qa: Test environment. developers can read and write data. developers can use tools to modify the table structure.
Sim: Simulate the environment, develop readable and writable, and use the web Platform; when initiating a launch request, it will be preexecuted in this environment first, this environment can also be used to deploy online drills or stress tests and read and write data.
Real: Production database slave database (quasi-Real-Time Synchronization), read-only environment, not allowed to modify data, not allowed to modify the table structure; for online problem search, data query, etc.
Online: Online environment. Developers are not allowed to perform database operations directly in the online environment. If you need to perform operations, You must contact the DBA to perform the operations and make corresponding records.
Machines in these environments must have clear permission division, separate read/write accounts, and have a degree of identification to differentiate specific services. For example, if the user name is w_wap, r_wap can tell that the read/write account is a wap application.
Iii. Database Development specifications
The development specifications include the basic naming and constraints, field design specifications, index specifications, and usage specifications.
Normative meaning
Ensure online database schema specifications
Reduce the probability of a problem
Convenient automated management
Long-term adherence to specifications is a win-win situation for developers and DBAs.
Constraints
Select UTF8 as the table character set. To store emoj expressions, use UTF8mb4 (supported after MySQL 5.5.3)
The storage engine uses InnoDB
Use varchar varbinary as long as possible
Images and files are not stored in the database
- Add comments when designing a table
The data volume of a single table is limited to less than 0.1 billion. The physical size of a single table cannot exceed 10 GB, and the average length of a row cannot exceed 8 KB.
Do not perform database stress testing online
Prohibit direct database connection from test and development environments
create database test_crm default character set=utf8;
Basic naming rules
Reserved Words are not allowed for database name, table name, and field name
The database name, table name, field name, and index name are separated by lowercase letters.
The database name, table name, field name, and index name must not be too long. It cannot contain more than 32 characters. Use as few characters as possible to express the purpose of the table.
Temporary database and temporary table names must be prefixed with tmp and suffixed with date
- The backup database and table must be prefixed with bak and suffixed with a date.
The database name, table name, field name, and index name use nouns as database names. Only English is used, and Chinese Pinyin is not required.
The database name must contain letters, lowercase letters, and 3-7 letters.
If the database name contains multiple words, the words are separated by underscores (_).
Table sharding Specification
Partition Table prohibited
Split large fields and low access frequency fields to separate cold and hot data
HASH is used for table sharding. The table name suffix uses the decimal number, and the subscript starts from 0.
Table sharding by seasonal time must comply with YYYY [MM] [DD] [HH] format
Adopt appropriate database/table sharding policies
Field Specification
All fields are defined as not null, unless you really want to save NULL, but I cannot think of the need to use Null
The smaller the field type, the better. UNSIGNED is used to store non-negative integers. There are not many negative numbers in actual use.
Use TIMESTAMP to store the time, and use unsigned int to store IPv4 addresses instead of CHAR (15). This method can only store IPv4 addresses, but cannot store IPv6 addresses.
Use VARCHAR to store variable-length strings. Of course, note that M in varchar (M) refers to the number of characters rather than the number of bytes;
Use DECIMAL instead of FLOAT and DOUBLE to store precise floating point numbers
Try not to use BLOB TEXT
Use TINYINT to replace the ENUM type and convert the character to a number.
Do not store plaintext passwords in Databases
Use VARBINARY to store case-sensitive variable strings
Index Specification
The number of fields in a single index cannot exceed 5, and the number of indexes in a single table cannot exceed 5. The index design follows the B + Tree index leftmost prefix matching principle.
Select a column with a high degree of discrimination as the index.
- Use a prefix index for a string. The prefix index length cannot exceed 8 characters.
Prefix indexes are recommended. If necessary, you can add pseudo columns and create indexes.
The index can cover 80% of the major queries, which is not comprehensive and solves the main contradiction of the problem.
Appropriate indexes should be created for DML, order by, and group by fields.
Avoid implicit conversion of Indexes
Avoid redundant Indexes
- Primary Key: The table must have a primary key. Columns with frequent updates are not used; string columns are not selected; UUID MD5 HASH is not used; non-null unique keys are used by default, we recommend that you select Auto-incrementing or poster
Important SQL statements must be indexed: WHERE condition columns of UPDATE and DELETE statements; fields of ORDER BY, GROUP BY, and DISTINCT; fields of Multi-table JOIN
Covering indexes for core SQL statements
Do not create an index on a low base column, such as "gender"
Mathematical operations and function operations are not performed on index Columns
Try not to use the external foreign key. The foreign key is used to protect the integrity of the reference and can be implemented on the business side. operations on the parent and sub-tables affect each other to reduce the availability. INNODB itself limits online DDL.
Do not make a forward % query, such as like "% AB"
Do not use negative queries, such as not in/like "cannot use indexes, resulting in full table Scan
In the implicit conversion example, the field is defined as varchar, but the input value is an int, which will cause a full table scan, requiring the program to perform a type check.
Field: remark varchar (50) NOT Null
mysql>SELECT id, gift_code FROM gift WHERE deal_id = 640 AND remark=115127;1 row in set (0.14 sec)mysql>SELECT id, gift_code FROM pool_gift WHEREdeal_id = 640 AND remark=‘115127’; 1 row in set (0.005 sec)
SQL specifications
Make the explain pre-compiled statement and pass only parameters, which is more efficient than passing SQL statements and reduces the SQL Injection probability.
Full use of prefix Indexes
Try not to use stored procedures, triggers, functions, and so on, so that the database can do what it is best.
Avoid using JOIN for large tables. The MySQL optimizer is too simple to optimize the join policy.
Avoid performing mathematical operations and other computing tasks in the database
SQL merge mainly refers to merging multiple values in DML to reduce interaction with the database.
Reasonable paging, especially large Paging
The UPDATE and DELETE statements do not use LIMIT, which may cause inconsistency between the master and slave nodes.
Replace or with in. The in value cannot exceed 1000.
Disable order by rand ()
SQL statements avoid using temporary tables
Use union all instead of union
The program should handle SQL exceptions.
Prohibit a single SQL statement from updating multiple tables at the same time
- When reading data, SELECT only the required columns. Do not SELECT * each time to avoid serious random read problems, especially reading some TEXT/BLOB columns.
Generally, the performance of subqueries is poor. We recommend that you convert it into JOIN statements.
When multi-table join queries are performed, the types of joined fields should be as consistent as possible, and all associated fields must have indexes.
When querying a multi-Table connection, use a table with a small result set (note: this refers to the filtered result set, which is not necessarily a small full table data volume) as the driving table.
When multiple tables are joined and sorted, the sorting field must be in the driving table. Otherwise, the index cannot be used in the sorting column.
Use multiple composite indexes instead of multiple independent indexes. For example, do not create independent indexes for columns with a base (Cardinality) that is too small (for example, the total number of unique values of this column is less than 255 ).
For SQL statements similar to the paging function, we recommend that you first associate with the primary key and then return the result set, which is much more efficient.
Iv. DBA standard version Selection
MySQL Community edition, the largest user group
MySQL Enterprise Edition, paid
Percona Server with many new features
MariaDB version, few domestic users
We recommend that you select the MySQL Community edition> Percona Server> MariaDB> MySQL Enterprise Edition.
Main Content
SQL audit, DDL audit, and operation time, especially OnlineDDL
High-risk operation check, data backup before Drop
Log analysis mainly refers to MySQL slow logs and error logs
Data backup solution
Online DDL
Native MySQL needs to lock the table when executing DDL statements, and the business cannot write data during the lock table, which has a great impact on the service. MySQL has poor support for this aspect.
Pt-online-schema-change is recommended.
The advantages of using pt-online-schema-change include:
Restrictions on Using pt-online-schema-change include:
It takes a long time to change the table (compared to modifying the table directly using alter table)
The modified table must have a unique key or primary key.
Concurrent Modification on the same port cannot be too many
MySQL cluster Solution
MHA is recommended first:AvailableOne master, multiple slaves, OrDual-master, multi-slaveIn this mode, MHA or MMM can be used to manage the entire cluster. The latest MHA also supports the GTID mode of MySQL 5.6.
MHA has obvious advantages:
Open-source, Perl-based development, clear code structure, and easy secondary development;
The solution is mature. During failover, MHA will make strict judgments to minimize data loss and ensure data consistency;
Provides a general framework for custom development based on your own situation, especially for judging and switching operation steps;
Supports binlog server to improve binlog transfer efficiency and further reduce the risk of data loss.
However, MHA has some limitations:
Ssh trust needs to be interconnected between nodes, which is a challenge for some companies' security systems, because if a node is cracked by hackers, other nodes will suffer;
The provided scripts also need to be further supplemented and improved. Of course, the general usage is still enough.
Splitting Problems
Solve the problem of excessive single-host writing pressure and capacity
Vertical Split and horizontal split
Moderate splitting. Do not over-split
It is best to have an intermediate layer to control the splitting logic, otherwise the splitting will be too costly to manage.
Data backup
Full backup VS Incremental Backup
Hot Standby VS Cold Standby
Physical backup VS logical backup
Delayed backup
Full binlog backup
Recommended method:
The main tasks are as follows:
Centralized backup policy scheduling management
Xtrabackup hot standby
Backup result statistical analysis
Backup data consistency check
Distributed File System storage and backup
Reasons for the Distributed File System Used in the backup system:
Solve Storage Allocation Problems
Solve the problem of low storage NFS backup Efficiency
Centralized storage management
Better data reliability