Schema optimization and indexing

Source: Internet
Author: User

There are many ways to present the given data. From the complete paradigm to the complete non-paradigm and between the two. In a database that conforms to the paradigm, each fact is presented once and only once. Conversely, in a database that is not normal, information is duplicated or stored in many places.

If you are unfamiliar with the paradigm, you should strengthen your study. On the paradigm, you can learn from some books and online resources. Here, we mainly introduce the knowledge that you should understand in this chapter. Let's take a look at the classic example, that is employee,departments, and department heads.

EMPLOYEE             DEPARTMENT                  HEAD
Jones               Accounting                  Jones
Smith               Engineering                 Smith
Brown               Accounting                  Jones
Green               Engineering                  Smith

The problem with this design is that when the data changes, the data model becomes dysfunctional. If Brown takes over the accounting department, we must update multiple statements to reflect this change, and these updates may also make the state of the data inconsistent. If Jones's head is different from Brown's head, there's no way to know if the head is right. As the old saying goes: A man has two watches, and he doesn't know the exact time. Further, it is not possible to show department when there are no employees. If we delete all the employees, the department information is deleted as well. To avoid this problem, we divide the table into two entities of employee and department. The result is two sheets:

EMPLOYEE_NAME        DEPARTMENT
Jones               Accounting
Smith               Engineering
Brown               Accounting
Green               Engineering

DEPARTMENT           HEAD
Accounting           Jones
Engineering          Smith

These tables belong to the second normal form and are sufficient for most of the requirements. The second paradigm is just one of many paradigms.

Disadvantages and advantages of a paradigm data model

Those requirements with high performance requirements are recommended with a normalized data model. Especially for the need for more write operations. The benefits of using a normalized model are as follows:

The normalized update speed is faster than the non normalized update.

When the data is well normalized, there is little or no duplication of data. So there will be very little data that needs to be updated.

The normalized tables are often small, so they can be put into memory to make performance better.

No more data means that when you get a list of values, you don't need too many ditinct or group by queries. Consider the previous example: it is not possible for a department that does not use distinct or group by to query a unique list in an irregular table, but if department is a stand-alone table, it is just a fetch query.

The disadvantage of a paradigm-like data model is getting data. Do some complex queries on a good paradigm model, at least by connecting to a table or more. Such consumption is not only large, but also makes some indexing strategies impossible. For example, normalization puts columns in separate tables, which benefit from the same index.

Disadvantages and advantages of non-normalized data models

An unstructured model might work well, thinking that any data is placed in the same table so that the connection is avoided.

If you don't need to connect to a table, look at the worst-case query-even if you don't use an index, it's a full table scan. When the data is not in memory, such queries are faster than the query that joins the table. Because it avoids random io.

A separate table also allows for more efficient indexing strategies. If you have a website where users send messages, and some users are paid users. Now you want to see the latest 10 messages from paid users. If you have normalized the data model and indexed the message release time, this query may be as follows:

mysql> SELECT message_text, user_name
    -> FROM message
    -> INNER JOIN user ON message.user_id=user.id
    -> WHERE user.account_type='premium'
    -> ORDER BY message.published DESC LIMIT 10;

The execution of this query is efficient, and MySQL needs to scan the published index. For each row found, it also needs to view the user table and check if the user is a paid subscriber. If only a small portion of the account is paid, then the query efficiency is low. Another possible query is to select all the paid users and then get all their information and do a sort of file. It could be worse.

The problem is on the connection so you can't use sorting and conditional filtering on an index. If you account_type,published the data, combine the two tables and add an index to it, you can write a query that doesn't need to be connected. This query is very efficient.

mysql> SELECT message_text,user_name
    -> FROM user_messages
    -> WHERE account_type='premium'
    -> ORDER BY published DESC
    -> LIMIT 10;

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.