Schema optimization and indexing

Last Update:2017-02-27 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

There are many ways to present the given data. From the complete paradigm to the complete non-paradigm and between the two. In a database that conforms to the paradigm, each fact is presented once and only once. Conversely, in a database that is not normal, information is duplicated or stored in many places.

If you are unfamiliar with the paradigm, you should strengthen your study. On the paradigm, you can learn from some books and online resources. Here, we mainly introduce the knowledge that you should understand in this chapter. Let's take a look at the classic example, that is employee,departments, and department heads.

EMPLOYEE 　　　　　　 DEPARTMENT 　　　　　　　　　HEAD Jones 　　　　　　　 Accounting 　　　　　　　　　Jones Smith 　　　　　　　 Engineering 　　　　　　　　 Smith Brown 　　　　　　　 Accounting 　　　　　　　　　Jones Green 　　　　　　　 Engineering 　　　　　　　　　Smith

The problem with this design is that when the data changes, the data model becomes dysfunctional. If Brown takes over the accounting department, we must update multiple statements to reflect this change, and these updates may also make the state of the data inconsistent. If Jones's head is different from Brown's head, there's no way to know if the head is right. As the old saying goes: A man has two watches, and he doesn't know the exact time. Further, it is not possible to show department when there are no employees. If we delete all the employees, the department information is deleted as well. To avoid this problem, we divide the table into two entities of employee and department. The result is two sheets:

EMPLOYEE_NAME 　　　　DEPARTMENT Jones 　　　　　　　 Accounting Smith 　　　　　　　 Engineering Brown 　　　　　　　 Accounting Green 　　　　　　　 Engineering DEPARTMENT 　　　　　 HEAD Accounting 　　　　　 Jones Engineering 　　　　　Smith

These tables belong to the second normal form and are sufficient for most of the requirements. The second paradigm is just one of many paradigms.

Disadvantages and advantages of a paradigm data model

Those requirements with high performance requirements are recommended with a normalized data model. Especially for the need for more write operations. The benefits of using a normalized model are as follows:

The normalized update speed is faster than the non normalized update.

When the data is well normalized, there is little or no duplication of data. So there will be very little data that needs to be updated.

The normalized tables are often small, so they can be put into memory to make performance better.

No more data means that when you get a list of values, you don't need too many ditinct or group by queries. Consider the previous example: it is not possible for a department that does not use distinct or group by to query a unique list in an irregular table, but if department is a stand-alone table, it is just a fetch query.

The disadvantage of a paradigm-like data model is getting data. Do some complex queries on a good paradigm model, at least by connecting to a table or more. Such consumption is not only large, but also makes some indexing strategies impossible. For example, normalization puts columns in separate tables, which benefit from the same index.

Disadvantages and advantages of non-normalized data models

An unstructured model might work well, thinking that any data is placed in the same table so that the connection is avoided.

If you don't need to connect to a table, look at the worst-case query-even if you don't use an index, it's a full table scan. When the data is not in memory, such queries are faster than the query that joins the table. Because it avoids random io.

A separate table also allows for more efficient indexing strategies. If you have a website where users send messages, and some users are paid users. Now you want to see the latest 10 messages from paid users. If you have normalized the data model and indexed the message release time, this query may be as follows:

mysql> SELECT message_text, user_name 　　　-> FROM message 　　　-> INNER JOIN user ON message.user_id=user.id 　　　-> WHERE user.account_type='premium' 　　　-> ORDER BY message.published DESC LIMIT 10;

The execution of this query is efficient, and MySQL needs to scan the published index. For each row found, it also needs to view the user table and check if the user is a paid subscriber. If only a small portion of the account is paid, then the query efficiency is low. Another possible query is to select all the paid users and then get all their information and do a sort of file. It could be worse.

The problem is on the connection so you can't use sorting and conditional filtering on an index. If you account_type,published the data, combine the two tables and add an index to it, you can write a query that doesn't need to be connected. This query is very efficient.

mysql> SELECT message_text,user_name 　　　-> FROM user_messages 　　　-> WHERE account_type='premium' 　　　-> ORDER BY published DESC 　　　-> LIMIT 10;

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Schema optimization and indexing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Schema optimization and indexing

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support