Common database-related issues

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Mysql

Why use the self-increment column as the primary key

If we define the primary key (PRIMARY key), then InnoDB chooses the primary key as the clustered index, and if the primary key is not explicitly defined, InnoDB chooses the first non-packet with a null value as the primary key index, and if there is no such unique index, The InnoDB will select the built-in 6-byte long rowid as an implied clustered index (ROWID is incremented with the write of the row record and the ROWID is not as referenced as the ROWID of Oracle, which is implied).
The data record itself is stored on the leaf node of the primary index (one b+tree). This requires that each data record in the same leaf node (the size of a memory page or a disk page) be stored in the primary key order, so that whenever a new record is inserted, MySQL inserts it into the appropriate node and position according to its primary key, if the page reaches the load factor (InnoDB defaults to 15/16). Opens a new page (node)
If the table uses the self-increment primary key, each time a new record is inserted, the record is added sequentially to the subsequent position of the current index node, and when a page is full, a new page is automatically opened
If you use a non-self-increasing primary key (such as a social security number or school number, etc.), since each insert primary key is approximately random, each new record is inserted in the middle of an existing index page, where MySQL has to move the data in order to insert the records into the appropriate location. Even the target page may have been written back to disk and cleared from the cache, and then read back from the disk, which adds a lot of overhead, while the frequent movement, paging operations caused a lot of fragmentation, the lack of compact index structure, the subsequent forced by optimize Table to rebuild the tables and refine the fill page.

Why using data indexing can improve efficiency

The storage of the data index is ordered
In an orderly manner, querying a data through an index does not require traversing the index records.
In extreme cases, the query efficiency of the data index is two-method query efficiency, approaching log2 (N)

The difference between a B + Tree index and a hash index

B + Tree is a balanced multi-fork tree, from the root node to each leaf node height difference of not more than 1, and the same level of nodes have pointers to each other, is ordered

Hash index is the use of a certain hash algorithm, the key value into a new hash value, the retrieval does not need to resemble a B + tree from the root node to the leaf node to search, only a hash algorithm can be unordered

Advantages of the hash index:

Equivalent query. A hash index has an absolute advantage (provided that there is no large number of duplicate key values, and the hash index is inefficient if a large number of duplicate key values exist because of the so-called hash collision problem. ）

Scenarios where the hash index is not applicable:

Range Query not supported
Indexing complete sorting is not supported
The leftmost prefix matching rule for federated indexes is not supported

In general, the B + Tree index structure is suitable for most scenarios, such as the following: A hash index is more advantageous.

In the heap table, if the stored data is very low repeatability (that is, the cardinality is large), the column data with the equivalent query-based, no scope query, no sorting, it is particularly suitable for the use of hash index, such as SQL:

Select Id,name from table where Name= ' Li Ming '; -Only equivalent query

A B + Tree index is used by default in the common InnoDB engine, which monitors the usage of indexes on the table in real time, and automatically hashes the "Adaptive Hash Index buffer" in memory if the hash index is considered to be more efficient (the adaptive hash Index is turned on by default in InnoDB). By observing the search pattern, MySQL uses the index key prefix to establish a hash index, and if a table is almost mostly in the buffer pool, establishing a hash index can speed up the equivalent query.

Note: Under some workloads, the performance gains through hash index lookups are much larger than the additional monitoring index search and the overhead of maintaining the hash table structure. At some point, however, a read/write lock added in an adaptive hash index can also be competitive, such as a high concurrency join operation, at high load. Wildcard operations like operations and% also do not apply to adaptive hash indexes, and you may want to turn off the adaptive Hash index.

The difference between B-and + + trees

B-Tree, each node stores key and data, all nodes make up the tree, and the leaf node pointer is nul, and leaf nodes contain no keyword information.

B + Tree, all the leaf nodes contain the information of all the keywords, and points to contain these key records of the pointer, and the leaf node itself according to the size of the keyword in order to link, all the non-terminal nodes can be regarded as the index part, the node only contains its sub-root node in the largest (or smallest) keyword. (The non-final node of the B-tree also contains valid information that needs to be found)

Why is B + more suitable for the operating system's file index and database index in the actual application?

B + Disk read and write costs Lower B + 's internal nodes do not have pointers to keyword-specific information. Thus its internal nodes are smaller than the B-trees. If you keep all of the same internal nodes in the same disk block, the number of keywords that the disk block can hold is more. The more keywords you need to find when you read into memory at once. The number of Io reads and writes is correspondingly lower.
B+-tree query efficiency is more stable because a non-endpoint is not the final node that points to the content of the file, it is just the index of the keyword in the leaf node. So any keyword search must take a path from the root node to the leaf node. The path length of all keyword queries is the same, resulting in a query efficiency equivalent for each data.

MySQL Federated Index

A federated index is an index on two or more columns. For federated indexes: MySQL left-to-right uses the fields in the index, and a query can use only one part of the index, but only the leftmost section. For example, the index is key index (A,B,C). Can support A, a, B, a,b,c 3 combinations to find, but does not support b,c to find. When the leftmost field is a constant reference, the index is very effective.
With additional columns in the index, you can narrow the scope of your search, but using an index with two columns differs from using two separate indexes. The structure of a composite index is similar to a phone book, where a person's name consists of a surname and a name, and the phone book is first sorted by last name, and then by name for people with the same last name. If you know the last name, the phone book will be useful, and if you know the first and last names, the phone book is more useful, but if you only know the first name, the phone book will be useless.

Under what circumstances should not be built or less indexed

Too few table records
Tables that are frequently inserted, deleted, modified
The data repeats and distributes the average table field, if a table has 100,000 rows of records, there is a field a only T and F two values, and the distribution probability of each value is about 50%, then the table a Jianjian index generally does not improve the database query speed.
Table fields that are frequently queried with the main field but larger than the primary field index value

MySQL Partition

I. What is a table partition?

Table partitioning refers to the decomposition of a table in a database into smaller, easier-to-manage parts according to certain rules. Logically, there is only one table, but the bottom layer is made up of multiple physical partitions.

Two. Differences between table partitions and tables

Sub-table: refers to a certain rule, a table is broken into a number of different tables. For example, the user order records based on time into multiple tables.

The difference between a table and a partition is that a partition logically has only one table, while a table breaks a table into multiple tables.

Three. What are the benefits of table partitioning?

Partitioned table data can be distributed across different physical devices, enabling efficient use of multiple hardware devices. 2. More data can be stored than a single disk or file system
Refine the query. When you include a partitioning condition in a where statement, you can scan only one or more partitioned tables to improve query efficiency, and when you involve the sum and count statements, you can also work on multiple partitions in parallel, and finally summarize the results.
Partitioned tables are easier to maintain. For example, to bulk delete large amounts of data, you can clear the entire partition.
You can use partitioned tables to avoid certain special bottlenecks, such as mutually exclusive access to a single index of InnoDB, ext3 ask for your system's Inode lock competition, and so on.

Four. Limiting factors in partitioned tables

A table can have a maximum of 1024 partitions
In MySQL5.1, the partition expression must be an integer, or an expression that returns an integer. Support for non-integer expression partitioning is provided in MySQL5.5.
If there is a primary key or a unique indexed column in the partition field, then more primary and unique index columns must be included. That is, the partition field either does not contain a primary key or an indexed column, or contains all primary keys and index columns.
FOREIGN KEY constraints cannot be used in partitioned tables
The MySQL partition applies to all data and indexes of a table, cannot partition only the table data, not the index partition, or partition the table, or partition only part of the table data.

Five. How can I tell if the current MySQL supports partitioning?

Command: Show variables like '%partition% ' run Result:

Six. What are the types of partitions supported by MySQL?

Range Partition: This mode allows data to be divided into different ranges. For example, you can divide a table into several partitions by year
List partitioning: This mode allows the system to split the data through predefined list values. According to the values in the list, the difference between ranges and range is that the range values of range partitions are contiguous.
Hash Partition: This mode allows the calculation of the hash key of one or more columns of a table, and finally partitions the data region of the hash code with different values. For example, you can create a table that partitions the primary key of a table.
Key partition: An extension of the above hash mode, where the hash key is generated by the MySQL system.

Four levels of isolation

Serializable (serialization): Can avoid dirty reading, non-repeatable reading, the occurrence of phantom reading.
REPEATABLE READ (Repeatable Read): Can avoid dirty read, non-repeatable read occurrence.
Read Committed: Prevents dirty reads from occurring.
READ UNCOMMITTED (unread): lowest level, no case guaranteed.

About MVVC

The MySQL InnoDB storage engine, which implements the Concurrency Control Protocol--MVCC (Multi-version Concurrency Control) based on multiple versions (note: Compared to MVCC, is lock-based concurrency control, lock-based Concurrency Control). MVCC The biggest benefit: Read without lock, read and write do not conflict. In an OLTP application that reads and writes less, read-write conflicts are very important, greatly increasing the concurrency of the system, and almost all of the RDBMS at this stage support MVCC.

Lbcc:lock-based Concurrency Control, lock-based concurrency controls.
Mvcc:multi-version Concurrency control, which is based on multiple versions of the Concurrency protocol. Because of the low concurrency of the purely lock-based concurrency mechanism, MVCC is an improvement on lock-based concurrency control, which mainly improves concurrency on read operations.

In MVCC concurrency control, read operations can be divided into two categories:

Snapshot reads (snapshot read): reads the visible version of the record (possibly a historical version) without locking (the shared read lock S lock is not added, so it does not block the write of other transactions).
Current read: Reads the latest version of the record, and the record returned by the current read is added with a lock to ensure that other transactions no longer concurrently modify the record.

Advantages of row-level locking:

There are only a few locking conflicts when accessing different rows in many threads.
There are only a few changes when rolling back
You can lock a single row for a long time.

Disadvantages of row-level locking:

Consumes more memory than page-level or table-level locking.
When used in most of the table, it is slower than page-level or table-level locking because you have to get more locks.
If you frequently perform group by operations on most data or you must scan the entire table frequently, it is significantly slower than other locks.
With high-level locking, you can easily adjust your application by supporting different types of locking, because its lock cost is less than row-level locking.

MySQL Trigger simple instance

CREATE TRIGGER < trigger name >--the trigger must have a name, a maximum of 64 characters, and may be appended with a delimiter. It is basically like naming other objects in MySQL.
{before | After}--the trigger has a time setting for execution: it can be set either before or after the event occurs.
{INSERT | UPDATE | DELETE}-the same can be set for triggered events: they can be triggered during an insert, update, or delete operation.
On < table name >-triggers are part of a table: triggers are activated when an INSERT, update, or delete operation is performed on the table. We cannot schedule two triggers for the same event in the same table.
For each row-the execution interval of the trigger: the For each row clause notifies the trigger to perform an action every other row, rather than performing the entire table once.
< trigger SQL Statement >--the trigger contains the SQL statement you want to trigger: The statement here can be any legitimate statement, including compound statements, but the statements here are constrained by the same limitations as functions.

What is a stored procedure

Simply put, is a set of SQL statements, powerful, can implement some more complex logic functions, similar to the Java language method;

PS: The stored procedure is a bit like a trigger, a set of SQL sets, but the stored procedure is called actively, and the function is more powerful than the trigger, and the trigger is automatically called after something is triggered;

What are the features

There are input and output parameters, can declare variables, have If/else, case,while and other control statements, by writing stored procedures, can achieve complex logic functions;
General features of functions: modularity, encapsulation, code reuse;
Fast, only the first execution needs to be compiled and optimized steps, subsequent calls can be directly executed, eliminating the above steps;

DROP PROCEDURE IF EXISTS ' Proc_adder ';

DELIMITER;;

CREATE definer= ' root ' @ ' localhost ' PROCEDURE ' proc_adder ' (in a int, in B int, out sum int)

BEGIN

#Routine body goes here ...

DECLARE c int;

If a is null and then set a = 0;

End If;

If B is null and then set B = 0;

End If;

Set sum = a + b;

END

;;

DELIMITER;

Set @b=5;

Call Proc_adder (0,@b,@s);

SELECT @s as Sum;

CREATE TABLE TaB2 (

tab2_id varchar (11)

);

DROP TRIGGER if EXISTS t_ai_on_tab1;

Create TRAILING T_AI_ON_TAB1

After INSERT on TAB1

For each ROW

BEGIN

INSERT into TAB2 (tab2_id) values (new.tab1_id);

End

INSERT into TAB1 (tab1_id) VALUES (' 0001 ');

SELECT * from TAB2;

MySQL optimization

Turn on query caching to optimize queries
Explain your select query, which can help you analyze the performance bottlenecks of your query statement or table structure. EXPLAIN's query results will also tell you how your index primary key is being used, how your data table is searched and sorted.
When a single row of data is used, the limit 1,mysql database engine stops searching after it finds a piece of data, rather than continuing to look for the next record-compliant data
Jianjian Index for search words
Using enum instead of varchar, if you have a field such as "gender", "Country", "nation", "state" or "department", you know that the values of these fields are limited and fixed, then you should use Enum instead of varchar.
Prepared statementsprepared Statements is much like a stored procedure, a collection of SQL statements running in the background, and we can derive many benefits from using Prepared statements, whether it's a performance issue or a security issue. Prepared statements can check some of your binding variables, which will protect your program from "SQL injection" attacks
Vertical Sub-table
Choosing the right Storage engine

The difference between key and index

Key is the physical structure of the database, it contains two levels of meaning and function, one is the constraint (emphasis on the structure integrity of the constraint and canonical database), and the second is the index (auxiliary query). Includes primary key, unique key, foreign key, etc.
Index is the physical structure of the database, which is only a secondary query, and it is created in a separate table space (the InnoDB tablespace in MySQL) stored in a similar directory structure. Index to classify, divided into prefix index, full-text index and so on;

What are the differences between MyISAM and InnoDB in Mysql?

Difference:

InnoDB support transactions, MyISAM is not supported, for INNODB each SQL language is encapsulated as a transaction by default, auto-commit, which affects speed, so it is better to put multiple SQL language between Begin and commit, the formation of a transaction;
InnoDB supports foreign keys, while MyISAM is not supported. The conversion of a InnoDB table containing a foreign key to MyISAM will fail;
InnoDB is a clustered index, the data file is tied to the index, must have a primary key, through the primary key index is highly efficient. However, the secondary index requires two queries, first querying the primary key, and then querying the data through the primary key. Therefore, the primary key should not be too large, because the primary key is too large and the other indexes will be large. While MyISAM is a nonclustered index, the data file is detached and the index holds a pointer to the data file. The primary key index and the secondary index are independent.
InnoDB does not save the exact number of rows in the table, and a full table scan is required to execute SELECT COUNT (*) from table. and MyISAM with a variable to save the entire table row number, the execution of the above statement only need to read out the variable, fast;
InnoDB does not support full-text indexing, while MyISAM supports full-text indexing, and the query efficiency is MyISAM high;

How to choose:

Whether to support transactions, if you want to choose InnoDB, if you do not need to consider myisam;
If most of the tables are read-only, consider MyISAM, and if both read and write are very frequent, use InnoDB.
After the system MyISAM, it is more difficult to recover, can accept;
MySQL5.5 version start InnoDB has become the default engine of MySQL (previously MyISAM), stating that its advantages are obvious, if you do not know what to use, then use InnoDB, at least not bad.

Considerations for database table creation

First, the field name and the field formulation rationality

Reject fields that are not closely related
Field naming should have rules and corresponding meanings (do not part of English, part of Pinyin, and similar a.b.c such as the unknown meaning of the field)
Field naming try not to use abbreviations (most abbreviations do not specify the meaning of the field)
Fields do not mix in case (want to be readable, multiple English words can be connected using underscores)
Field names do not use reserved words or keywords
Maintain the consistency of field names and types
Choose a number type carefully
Sufficient margin for text fields

Second, the System Special field processing and the proposal after completion

Add a delete tag (for example, operator, delete time)
Build a version mechanism

Three, the table structure rationality disposition

The processing of a multi-type field is the existence of a field in a table that breaks down into smaller, independent parts (for example: People can be divided into men and women)
Multi-valued field processing, can divide the table into three tables, so that the retrieval and sorting more conditioning, and ensure the integrity of the data!

Iv. Other recommendations

For large data fields, stand-alone tables are stored to affect performance (for example: Introduction fields)
Use the varchar type instead of char because varchar dynamically allocates the length, and char specifies that the length is fixed.
Creates a primary key for a table that has a certain effect on the query and index definitions for tables that do not have a primary key.
To avoid table field run as NULL, it is recommended to set the default value (for example: Int type setting defaults to 0) on the index query, the efficiency stands out!
Indexing is best built on both unique and non-empty fields, and establishing too many indexes has some impact on late insertions and updates (consider the actual situation to create).

Redis

Redis Single Thread Issues

Single-threaded means that the network request module uses a single thread (so there is no need to consider concurrency security), that is, one thread handles all network requests, and the other modules still use multiple threads.

Why Redis can execute quickly

The vast majority of requests are purely memory operations (very fast)
Single threaded to avoid unnecessary context switching and competitive conditions
Non-blocking Io-io multiplexing

Internal implementation of Redis

The internal implementation adopts epoll, and adopts the simple event framework which is implemented by epoll+ itself. Epoll read, write, close, connect are transformed into events, and then take advantage of the Epoll multiplexing feature, not to waste a little time on Io These 3 conditions are not independent of each other, especially the first, if the request is time-consuming, with single-threaded throughput and poor performance. Redis has chosen the right technical solution for a particular scenario.

Redis About thread safety issues

Redis is actually a thread-gated concept that encloses a task in a thread, naturally avoiding thread-safety issues, but for composite operations that rely on multiple redis operations, locks are still required and may be distributed.

What are the benefits of using Redis?

Fast because the data exists in memory, and the advantage of Hashmap,hashmap is that the time complexity of finding and Operating is O (1)
Supports rich data types and supports string,list,set,sorted Set,hash
Support transactions, operations are atomic, so-called atomicity is to change the data is either all executed, or all do not execute
Rich features: can be used for caching, messages, press key to set the expiration time, after expiration will be automatically deleted

What are the advantages of Redis compared to memcached?

Memcached all the values are simple strings, redis as its replacement, supports richer data types
Redis is much faster than memcached.
Redis can persist its data
Redis supports backup of data, that is, Master-slave mode of data backup.
Unlike the underlying model, the underlying implementation is not the same as the application protocol that communicates with the client. Redis builds its own VM mechanism directly, because the normal system calls system functions, it wastes a certain amount of time to move and request.
Value size: Redis can be up to 1GB, while Memcache is only 1MB

Redis Master-slave replication

Process principle:

When an MS relationship is established from the library and the main library, the Sync command is sent to the primary database
When the master library receives the Sync command, it begins to save the snapshot in the background (the RDB persistence process) and caches the write commands received during the period.
When the snapshot is complete, the master Redis sends the snapshot files and all the cached write commands to the Redis
When received from Redis, the snapshot file is loaded and the received cached command is executed
After that, the master Redis sends commands from Redis whenever it receives a write command, guaranteeing consistent data

Disadvantage: All slave node data replication and synchronization are handled by the master node, will be the master node pressure is too large, using master-slave structure to solve

Advantages and disadvantages of two types of Redis persistence methods

RDB persistence to generate a point-in-time snapshot of a dataset within a specified time interval (point-in-time snapshot)
The AOF persists all the write commands that the server performs and restores the dataset by re-executing the commands when the server starts.
Redis can also use both AOF persistence and RDB persistence. When Redis restarts, it is limited to using the AoF file to restore the dataset because the AoF file saves a dataset that is typically more complete than the data set saved by the Rdb file

Advantages of the RDB:

An RDB is a very compact file that holds the data set of a Redis at a point in time. This file is ideal for backup: For example, you can back up an RDB file every hour within the last 24 hours, and also back up an RDB file every day of the month. In this case, you can restore the dataset to a different version at any time, even if you encounter problems.
The RDB is ideal for disaster recovery (disaster recovery): It has only one file, and the content is very compact and can be transferred (after encryption) to another datacenter, or Amazon S3.
An RDB maximizes the performance of Redis: The only thing the parent process has to do when saving an Rdb file is to fork out a child process, and the child process will handle all subsequent save work, and the parent process does not have to perform any disk I/O operations.
RDB recovers large data sets faster than AOF

What are the common performance issues with Redis? How to solve?

Master Write memory snapshot, save command dispatch Rdbsave function, will block the work of the main thread, when the snapshot is large, the performance impact is very large, will intermittently pause service, so master should not write memory snapshot.
Master AoF Persistence, if you do not rewrite the aof file, the performance impact of this persistence is minimal, but the aof file will continue to grow, aof file over the General Assembly to affect the recovery speed of master restart. Master should not do any persistent work, including memory snapshots and aof log files, in particular, do not enable memory snapshots to persist, if the data is more critical, a slave open aof backup data, the policy is synchronized once per second.
Master calls bgrewriteaof rewrite aof file, aof in the time of rewriting will occupy a large amount of CPU and memory resources, resulting in service load is too high, there is a short service pause phenomenon.
Redis Master-slave replication performance issues, for master-slave replication speed and connection stability, slave and master preferably in the same LAN

Redis offers 6 data-culling strategies

VOLATILE-LRU: Pick the least recently used data from the set of data sets (Server.db[i].expires) that have expired time
Volatile-ttl: Select the data that will expire from the set of expired data sets (Server.db[i].expires)
Volatile-random: Choose data culling from any data set (Server.db[i].expires) that has an expiration time set
ALLKEYS-LRU: Pick the least recently used data culling from the dataset (Server.db[i].dict)
Allkeys-random: Choose data culling from data set (SERVER.DB[I].DICT)
No-enviction (expulsion): Prohibition of eviction data

Common database-related issues

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Common database-related issues

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Common database-related issues

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support