First, the database design
1. Table Structure Design
-The self-increment column in the table (Auto_increment property) is recommended to use the bigint type -the preferred non-null unique key, followed by the selection of the self-increment column or the generator does not use the updated columns, try not to select string columns, not using the UUID MD5 hash - Tinytint or smallint types are recommended for fields with very few selectivity in the business -the IP address field in the business uses the INT type -the creation time field of the row data must be in a large, business-active table Create_ Time and last updated date field Update_time -all fields in the table must be not NULL properties, the business can define default values as needed -store exact floating-point numbers with decimal (do not use floating-point types)
2. Index Design
-Avoid redundant indexes: To avoid indexing the same field, the establishment of the index needs to be evaluated based on the SQL statement accessed -one query, one table can only use one index, and no single index for each query criteria field -the number of indexes per table is not more than 7. Number of single indexed fields not exceeding 5-not indexed on null column -not indexed on low cardinality columns, such as "gender" -compound indexed field sort, most differentiated fields are placed in front -core SQL Overrides index - Using prefix indexes on strings
3. Character set and check set
-the character set of the database and table must be consistent, and the character set of all tables must be the same, only utf8; all tables in the database have a uniform checksum set -the character set of the primary and the database must be consistent
4. Other requirements
-It is not recommended to use foreign keys, temporary tables, views, custom functions, stored procedures, and triggers on the-SSD hard disk, the number of single-table data rows cannot exceed 50 million or the storage space must not be greater than 30GB -sas hard disk, The number of single-table data rows cannot exceed 20 million or the storage space must not be greater than 15GB
Ii. SQL Authoring
1. Select
The-select statement must specify a specific field name, which is forbidden to be written as "SELECT *"
2. DML
The-insert statement must specify a specific field name and not be written as Insert VALUES (...). Form
3, multi-table joint query
-Multi-table connection queries are recommended for aliases, and in the select list, use aliases to refer to fields, database. Tabular, such as "Select A.cid from IKNOW_QB. tblreply a WHERE ..." -production systems, It is not recommended to connect more than 3 tables (including 3 tables) in a single query -in production systems, it is strongly not recommended to use outer associations, including left Outer Association, right Outer association, and full Outer association -in queries with multiple table joins, the driver table needs to select a table with a smaller result set - Prohibit writing SQL statements that are nested in multi-layered subqueries, and recommend rewriting the format of table sequential joins
4. Business
-Insert| in the transaction update| delete| The number of rows in the Replace statement operation is controlled at 2000, and the number of arguments in the in list in the WHERE clause is controlled in the $ -batch operation data, need to control the transaction interval time, the necessary sleep, the specific value is given by the DBA, and the program must have interrupt processing capacity -for INSERT operations on tables with auto_increment attribute fields, concurrency needs to be controlled within 200/s -sql level/transaction level/master database The table storage engine type is consistent, mixed use of the storage engine can cause master-slave data inconsistency or master-slave synchronization interrupt -For a read-only query that is not sensitive to synchronization latency, it must be placed on the library and, for synchronous latency-sensitive read-only queries, can be placed on the main library for execution
5, Table scanning method:
-select| update| delete| Replace must have a WHERE clause, and the condition of the WHERE clause must use an index lookup -a full table scan on a large table is strongly not recommended in a production database, but a full table can be scanned for static tables below 5000 rows - Large table full table scan and full table export (dump) in the business is recommended to be placed in the backup library or offline read the library in the -where clause to prohibit the use of only the full fuzzy like conditions to find (such as "%aj%"), there must be other query conditions
6. Sorting and Grouping
-Query with distinct, order by, and GROUP BY clauses with intermediate result set limited to 10000 rows
7. Other requirements
-The size limit of a single SQL statement is less than 5MB -the intermediate result set and final result set of the SQL statement in the production database must be limited to 5MB -the SQL statement in the production database disables the use of hints, such as Force Index,ignore Index, Straight_join,sql_no_cache -Prohibit use of the full-text search feature- disable event function -do not use or manipulate MySQL library and test library in the program. Disallow creation of test or libraries starting with test -Disable the use of user-defined variables in MySQL -do not perform calculations such as real-time statistics or summaries of the business in the online database, and then use other tools or offline backup libraries to complete - Reduce the number of interactions with the database INSERT ... On DUPLICATE KEY update REPLACE to , insert IGNORE, insert into VALUES (), (), () UPDATE ... WHERE ID in (a,b,c,...) -Do not use negative queries, such as not in,!=, do not --do not perform mathematical operations and function operations on indexed columns -do not use% leading queries, such as like "%ABC"
Three, MySQL related features introduction
1, MySQL processing characteristics of SQL
-sql request processing can only use one kernel -no SQL compile cache, SQL stored procedures are hard -parse-the index does not support operation comparison -most of the time a query can only use one index -not support hash jion ( MARIADB currently supported) -threading-based external service model (high number of connections, severe degradation of performance)
2. mysql Supported storage size
-Single tablespace 64T, each table has only one table space, that is, each single table maximum 64T -innodb Logfile plus up to 512G -each row size limit 65535 byte -up to 1027 fields per table
3. mysql Production reference index
-Single instance preferably not more than 1T, except the surrounding log, the maximum is not recommended more than 5T -general OLTP single table recommended maximum not more than 10G -usually in the case of a buffer hit: Select can reach 3-6w/s Insert in the case of a clustered index continuous can be to 2w-3w/s in the case of a clustered index discontinuity, it is possible that the 200-300/S update data in memory can be achieved 3k/s Delete data can reach 1k/s in memory, possibly less
Iv. audit of the establishment of the table
V. Capacity Assessment
1. Overview of capacity evaluation
all databases on-line: New cluster, new database, new table, all need to advance the capacity assessment, to prevent the subsequent capacity problems and on-line business to adjust, expand, migration and other operations, thereby affecting the online business. Capacity includes: Access volume (read/write), Data and growth, disk space capacity.
2. Table capacity
table capacity is mainly evaluated from the number of records, average length, growth, read and write, total size of the table. For OLTP tables, it is recommended that you do not exceed 2000W rows of data, with a total size of 15G or less. Traffic: Single-table read and write volume within 1600/s.
For millions of tables with single-table data, the length of each row should not be too long, not to be placed in the same table as the text, blob, and other field types. (MySQL data page size 16K, the longer each row of records, the fewer records per page of data stored, so when the data retrieved, will generate more IO)
3. Instance capacity
MySQL is a threading-based service model, so in some high-concurrency scenarios, the single instance does not take full advantage of the server's CPU resources, but the throughput will be stuck in the MySQL layer, especially for the mysql5.5 version. Great optimizations were made in MySQL version 5.6, and the Percona version has the thread pool, which can adequately deal with the excessive consumption of CPU context switches in high concurrency scenarios.
Single instance QPS throughput is generally controlled within 20000/s, and write volumes also need to be considered from the library latency issue, for the mysql5.6 version you can consider the sub-library and then sub-table, take full advantage of the 5.6 version of the library-level multi-threaded replication, thereby increasing the throughput of writes.
4. Disk space
servers typically host multiple DB instances, so before each instance goes live, it is necessary to evaluate the data volume of each instance and the growth of several major tables in 1-2 years, and estimate the amount of data to be as accurate as possible to each field. For businesses that are not growing at a particularly fast rate (doubling in half a year), it is recommended that the volume of data for 1-2 years will ultimately be within 70% of the disk usage. At the same time, for some data to grow faster, consider using a large slow disk for data archiving.
MySQL access specification and capacity evaluation