Selection of primary keys and disk performance in MySQL

Source: Internet
Author: User

Accidentally saw "fotolog:scaling the world\ 's largest Photo Blogging Community", only to find a lot of database optimization truth is very simple, to high level is when you face the problem, whether really made their own thinking, And not just the empirical inertia that dictates:

Background of this article: a picture site, each picture has a lot of comments. Browse will execute: SELECT ... From ... WHERE Photo_identifier = ... ORDER by posted ...

In the "Old Schema" solution, everything looks quite right: Use the most common self-increment field identifier as the primary key while using Photo_identifier, posted as the index.



The data is sorted by primary key, and the data is aligned against the index when the query is executed. The problem here, though, is that the same image's comment data is scattered over multiple data pages on disk. This means that when querying this data, the disk will constantly adjust the data positioning. This is a non-small IO overhead.



In the "New Schema" solution, although the self-increment field was also used, the Federated primary Key Photo_identifier, Posted,identifier, and identifier as the index. It is also important to note that the table type uses INNODB and reduces the length of the self-increment field so that the primary key length is shorter and helps improve the performance of the InnoDB.



The data is sorted by the Federated primary Key, because the Photo_identifier field is the first field in the Federated primary key, so for a picture, all of its comments are saved in a contiguous position on the disk. In this case, when the data is positioned, InnoDB is optimized: "Pending read", the so-called pendingread, refers to when a read occurs, and is not necessarily directly from the file system "physical read", but only from the buffer pool "logical read ", InnoDB internal optimization mechanism can be combined multiple times" logical read "for a" physical read ", thereby reducing IO consumption, improve disk performance.



There is one more question to consider when using Photo_identifier, when posted,identifier a federated primary key, when commenting on an "old picture" (Photo_identifier smaller picture), the data is recorded in the comparison On top of the data page (because the physical order in which data is saved on the hard disk is sorted by primary key), this can cause a small IO burden compared to using the identifier self-increment primary key, since the new data is always at the end of the data file when the new data is added by the self-increment primary key. Therefore, the actual application, the method shown in the article is available, but also from the objective situation analysis, such as the main focus on the "new picture", the IO problem is not very large, because the "new picture" record is located in the position of the data file, but if the comments distributed pictures more random, Then it is necessary to consider whether this method is suitable, but it can also be adapted, for example, in the structure of the master-slave server, we can use the identifier self-increment primary key on the primary server, and use photo_identifier,posted on the slave server. Identifier The combined primary key, which ensures the efficiency of the write operation and the efficiency of the read operation.

Turn from: Http://hi.baidu.com/thinkinginla ... d21b01b3de0580.html

Selection of primary keys and disk performance in MySQL

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.