Comparison between PostgreSQL and MySQL, Part 1: Table Organization

Source: Internet
Author: User
The content of this article complies with the CC copyright agreement, can be reproduced at will, but must be indicated in the form of Hyperlink original source and author information and copyright statement URL: www.penglixun.comtechdatabasemysql-vs-postgresql-part-1-table-organization.html translated from: blogs. enterprisedb. com20101129my

The content of this article follow the CC copyright agreement, can be reproduced at will, but must be in the form of hyperlink to indicate the original source of the article and the author information and copyright statement Web site: http://www.penglixun.com/tech/database/mysql-vs-postgresql-part-1-table-organization.html

The content of this article complies with the CC copyright agreement, can be reproduced at will, but must be in the form of hyperlink to indicate the original source of the article and author information and copyright statement URL: http://www.penglixun.com/tech/database/mysql-vs-postgresql-part-1-table-organization.html

From: http://blogs.enterprisedb.com/2010/11/29/mysql-vs-postgresql-part-1-table-organization/
Please correct the incorrect translation.

I'm going to be starting an occasional series of blog postings comparing MySQL's architecture to PostgreSQL's architecture. regular readers of this blog will already be aware that I know PostgreSQL far better than MySQL, having last used MySQL a very long time ago when both products were far less mature than they are today. so, my discussion of how PostgreSQL works will be based on first-hand knowledge, but discussion of how MySQL works will be based on research and-insofar as I'm can make it happen-discussion with people who know it better than I do. (Note: If you're a person who knows MySQL better than I do and wowould like to help me avoid making stupid mistakes, drop me an email .)
I am going to start a blog on the MySQL and PostgreSQL architecture series. Readers of this blog know that the last time I used MySQL was a long time ago when both products were far worse than today, so I think PostgreSQL is far better than MySQL. Therefore, I have discussed how PostgreSQL works based on first-hand information, while MySQL is based on a long time ago. Many of my blog users know MySQL better than I do. If you are a person who knows MySQL better than me and finds my stupid mistake, please send me an email.

In writing these posts, I'm going to try to avoid making value judgments about which system is "better", and instead focus on describing how the architecture differs, and maybe a bit about the advantages of each architecture. I can't promise that it will be entirely unbiased (after all, I am a PostgreSQL committer, not a MySQL committer !) But I'm going to try to make it as unbiased as I can. also, bearing in mind what I 've recently been told by Baron Schwartz and Rob Wultsch, I'm going to focus completely on InnoDB and ignore MyISAM and all other storage engines. finally, I'm going to focus on your tural differences. people might choose to use PostgreSQL because they hate Oracle, or MySQL because it's easier to find hosting, or either product because they know it better, and that's totally legitimate and perhaps worth talking about, but-partly in the interests of harmony among communities that ought to be allies-it's not what I'm going to talk about here.
To write these articles, I should try to avoid making a better system judgment. Instead, I should focus on introducing the differences between their architectures, maybe the advantages of various architectures. I cannot ensure that these ideas are completely unbiased (after all, I am a submitter of PostgreSQL code, rather than a submitter of MySQL), but I will try my best to avoid focusing on one. In addition, considering what I have recently told Baron Schwartz and Rob Wultsch, I will ignore MyISAM and all other storage engines and focus on InnoDB. Finally, I will focus on architecture differences. People sometimes choose to use PostgreSQL because they hate Oracle, or choose MySQL because it is easier to find managed services, or some other products because they know it is better and it is fully authorized. But this is not what I want to talk about. (Note: The last paragraph is too difficult to translate. It is just a general idea)

So, all that having been said, what I 'd like to talk about in this post is the way that MySQL and PostgreSQL store tables and indexes on disk. in PostgreSQL, table data and index data are stored in completely separate structures. when a new row is inserted, or when an existing row is updated, the new row is stored in any convenient place in the table. in the case of an update, we try to store the new row on the same page as the old row if there's room; if there isn't room or if it's an insert, we pick a page that has adequate free space and use that, or failing all else extend the table by one page and add the new row there. once the table row is added, we cycle through all the indexes defined for the table and add an index entry to each one pointing at the physical position of the table row. one index may happen to be the primary key, but that's a fairly nominal distinction-all indexes are basically the same.
Therefore, what I will talk about is how MySQL and PostgreSQL tables and indexes are stored on disks. In PostgreSQL, table data and index data are completely stored separately. When a new row is inserted or an existing row is updated, the new row is saved anywhere in the table. In the update scenario, we try to store the new row and the old row on the same page when there is space in the page. If there is no space, or if it is an insert operation, we will select a page with sufficient free space, use it, or expand a new page to put the new row. All the indexes defined in the training table are rotated, and an index pointer is added to point to the physical location of the new row in the table. This index may be a primary key, or a general index, but all indexes are based on the same operation.

Under MySQL's InnoDB, the table data and the primary key index are stored in the same data structure. as I understand it, this is what Oracle Callan index-organized table. any additional ("secondary") indexes refer to the primary key value of the tuple to which they point, not the physical position, which can change as leaf pages in the primary key index are split. since this architecture requires every table to have a primary key, an internal row ID field is used as the primary key if no explicit primary key is specified.
In InnoDB, table data and primary key indexes exist in the same data structure ). As far as I understand, this is like Oracle's index organization table (Note: there are still some differences: The index organization table is fully sorted by index, but InnoDB only sorts by primary key ). Any non-primary key index points to the location of the primary key index, rather than the physical location, so the split of the page nodes on the primary key index page will not cause data changes. Because this architecture requires that each table has a primary key, a primary key is implicitly defined if no primary key is defined (the internal primary key is 6 bytes ).

Since Oracle supports both options, they are probably both useful. an index-organized table seems participant ly likely to be useful when most lookups are by primary key, and most of the data in each row is part of the primary key anyway, either because the primary key columns are long compared with the remaining columns, or because the rows, overall, are short. storing the whole row in the index avoids storing the same data twice (once in the index and once in the table ), and the gain will be larger when the primary key is a substantial percentage of the total data. furthermore, in this situation, the index page still holds as week, or almost as week, keys as it wowould if only a pointer were stored in lieu of the whole row, so one fewer random I/OS will be needed to access a given row.
Since Oracle supports two options (index organization table and heap table), they may be very useful. An index organization table seems to be in most SQLIt is useful when querying through the primary key and most of the data in each row is a part of the primary key. Either because the primary key column is longer than the remaining columns, or because the row is shorter in general. The whole row of data is stored in the index to avoid two points of the same data (one in the index and one in the Table). However, if the primary key accounts for a large proportion of data rows, data gain (Translator's note: Data + repeated data volume of the table) will be larger. In addition, in this case, the index page will save a lot of or almost the same amount of data. when accessing the data, you may obtain the columns required by the entire row on the index page, therefore, this can reduce random IO (covered Index Scan and Index Scan ).

When accessing an index-organized table via a secondary index, it may be necessary to traverse both the B-tree in the secondary-index, and the B-tree in the primary index. as a result, queries involving secondary indexes might be slower. however, since MySQL has index-only scans (PostgreSQL does not), it can sometimes avoid traversing the secondary index. so in MySQL, adding additional columns to an index might very well make it run faster, if it causes the index to function as a covering index for the query being executed. but in PostgreSQL, we frequently find ourselves telling users to pare down the columns in the index to the minimum set that is absolutely necessary, often resulting in dramatic performance gains. this is an interesting example of how the tuning that is right for one database may be completely wrong for another database.
When you access an index organization table through a non-primary key index, you may need to traverse the B tree of the non-primary key index and the B tree of the primary key index. Therefore, queries involving non-primary key indexes may slow down. However, because MySQL has the Index-Scan method (you can access the Index to obtain data) and PostgreSQL does not, sometimes it can access non-primary key indexes to obtain data. Therefore, if an additional column index is added to MySQL to overwrite the index query plan, the SQLIt runs faster (Note: if there are too many indexes, there are still a lot of physical IO operations when the index page is split. We recommend that you reduce the index when the requirements are met, unless you can ensure that the covered index is frequently used ). However, in PostgreSQL, we often find that a huge performance improvement is often achieved when we tell users to minimize the number of indexes to meet the requirements. This is an interesting example. How to adjust a database in different databases is totally different.

I 've recently learned that neither InnoDB nor PostgreSQL supports traversing an index in physical order, only in key order. for InnoDB, this means that ALL scans are saved med in key order, since the table itself is, in essence, also an index. as I understand it, this can make a large sequential scan quite slow, by defeating the operating system's prefetch logic. in PostgreSQL, however, because tables are not index-organized, sequential scans are always specified med in physical order, and don't require looking at the indexes at all; this also means we can skip any I/O or CPU cost associated with examining non-leaf index pages. traversing in physical order is apparently difficult from a locking perspective, although it must be possible, because Oracle supports it. it wocould be very useful to see this support in MySQL, and once PostgreSQL has index-only scans, it wocould be a useful improvement for PostgreSQL, too.
I recently learned that PostgreSQL, like InnoDB, also supports sequential traversal through primary key indexes (Note: InnoDB accesses the entire table and returns data in the primary key order ). For InnoDB, this means that all the full table scans the primary key index, and the primary key index itself is the table. As far as I know, this may lead to a much slower scanning speed in a large sequence (Translator's note: In the case of static data, PostgreSQL also needs to access the next block through the block pointer, innoDB accesses the next page through the page pointer ). In PostgreSQL, because the table is not organized by (primary key) index, sequential scanning is always performed in the physical order, and no access to the index is required at all, this also means that we can skip the I/O or CPU overhead of any non-leaf node that accesses the index (Note: This platform should forget what B + tree is ). Obviously, it is very difficult to access in physical order, but it can certainly be implemented because of Oracle Support. This is a very useful feature of MySQL. Once PostgreSQL has the Covering Index scanning function, it will also be very useful for PostgreSQL.

One final difficulty with an index-organized table is that you can't add, drop, or change the primary key definition without a full-table rewrite. in PostgreSQL, on the other hand, this can be done-even while allow concurrent read and write activity. this is a fairly nominal advantage for most use cases since the primary key of a table rarely changes-I think it's happened to me only once or twice in the last ten years-but it is useful when it does comes up.
The last problem with using indexes to organize tables is that you cannot add, delete, or change the definition of a primary key index without recreating a full table. On the contrary, this can be done in PostgreSQL-even when concurrent read/write activities are allowed. In most cases (InnoDB) is advantageous because it is unlikely to be changed once a primary key is defined in most scenarios. I only met one or two times in the last ten years-but PostgreSQL is still useful when it actually happens.

I hope that the above is a fair and accurate summary of the topic, but I'm sure I 've ve missed a few things and covered others incompletely or in less detail than might be helpful. please feel free to respond with a comment below or a blog post of your own if I 've missed something.
I hope the above is a fair and accurate summary of this topic, but I am sure that I have missed something or some content is incomplete, some details that may be helpful are missing. Please feel free to comment on your comments on some of my omissions in the following comments.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.