The design idea and realization of the sub-table processing

Source: Internet
Author: User

Original: http://blog.sina.com.cn/s/blog_4d8a2c970100fe4l.html I. Overview  is a relatively popular concept at present, especially in the case of large load, the sub-table is a good way to spread the pressure of the database.   First to understand why the table, the benefits of the sub-table is what. Let's take a look at one of the following database execution SQL procedures: Receive SQL--put into SQL execution queue--use parser decomposition sql --> extract data based on analysis results or modify--return processing results   of course , this flowchart is not necessarily correct, this is just my own subjective consciousness so I think. So what is the most likely problem in this process? That is, if the previous SQL is not completed, then SQL will not be executed, because in order to guarantee the integrity of the data, the data table file must be locked, including the shared lock and the exclusive lock two locks. A shared lock is a lock during which other threads can access the data file, but no modification is allowed, and the corresponding, exclusive lock is that the entire file is owned by one thread, and other threads cannot access the data file. General MySQL is the fastest storage engine MyISAM, it is based on table locking, that is, if a lock, then the entire data file can not be accessed outside, must wait until the previous operation is completed before receiving the next operation, then the previous operation did not complete, The latter operation waits in the queue to be unable to execute the situation is called the blockage, generally we are called "the lock Table" in the popular sense.   What are the immediate consequences of the lock list? Is that a large amount of SQL cannot be executed immediately, and must wait until the SQL in front of the queue has been fully executed to continue execution. This non-executable SQL can result in no results, or severe delays that affect the user experience.   Especially for some use more frequent tables, such as the SNS system in the User Information table, forum system posts table and so on, is a large number of access to the table, in order to ensure that the rapid extraction of data back to the user, must use some processing methods to solve the problem, this is what I want to talk about the sub-table technology today.    Sub-table technology as the name implies, is to store several tables of the same type of data into several table sub-table storage, when extracting data, different users access to different tables, non-conflict, reduce the probability of locking the table. For example, the current save User sub-table has two tables, one is the user_1 table, there is a user_2 table, two tables save different user information, user_1 saved the first 100,000 of the user information, user_2 saved the last 100,000 users of information, now if you query the user Heiyeluren1 and Heiyeluren2 This two users, then is the table from the different tables extracted, reduce the possibility of locking table.    I'm going to tell you two kinds of sub-table methods I have not experimented with, do not guarantee the accuracy of use, but to provide a design ideas. The following is an example of the sub-table I assume is a post-paste system based on the processing and construction. (If you have not used the bar users hurriedly Google a bit)     Second, table-based processing of the underlying table   This is based on the basic table of the main idea is: a primary table, save all the basic information, if a project needs to find the table it is stored, then you must find the corresponding table name and other items from the base table, so that you can directly access the table. If you feel that the base table is not fast enough, you can completely save the entire base table in cache or in memory for efficient query.   Based on the situation of bar paste, we construct 3 tables as follows:  1. Bar Section Table: Save the bar in the section of Information 2. Stick to the theme table: Save the bar in the section of the topic Information for browsing 3. Post bar reply table: Save the original content of the theme and reply content    "Bar table" contains the following fields: Section id      board_id          int (10) section name    board_name     char (50) Sub-table id      table_id            smallint (5) generation time    created             datetime  "Stick Theme Table" contains the following fields: Subject ID         topic_id        int (10) Theme name        topic_name    char ( 255) Section ID         board_id          int (10) when createdRoom       created          datetime   The fields of "post-bar reply table" are as follows: Reply id       reply_id           int (10) Reply to content      reply_text        text Theme ID       topic_id           int (10) Section id       board_id         int (10) Creation time      created            datetime  then the above saved the table structure information of our entire bar, three tables corresponding to the relationship is:  section  -->  multiple Theme  -- >  multiple replies   So the table file size relationship is: section table file  <  subject Table file  <  reply to table file   So it's almost certain that the topic table and the reply table need to be divided into tables, which has increased the speed and performance of our data retrieval query changes.   looked at the above table structure, it will be obvious that in the "section table" saved a "table_id" field, this field is used to save a section corresponding to the topic and replies are the table is saved in what form.   For example, we have a bar called "PHP", board_iD is 1, the child ID is also 1, then this record is:  board_id | Board_name | table_id | created1 | PHP | 1 | 2007-01-19 00:30:12  Accordingly, if I need to extract all the topics in the "PHP" bar, then you have to follow the table to save the table_id to combine the name of the tables that store the theme, such as our topic table prefix is "topic_", then the combination of " PHP "bar corresponding to the topic table should be:" Topic_1 ", then we do:  select * from topic_1 WHERE board_id = 1 ORDER by topic_id DESC LIMIT 10  This will be able to obtain Take this topic to reply to the list, so that we can view, if you need to see a topic below the reply, we may continue to use the "table_id" saved in the section table to query. For example, our reply table prefix is "reply_", then we can combine the "PHP" bar ID 1 of the subject's reply:  select * from reply_1 WHERE topic_id = 1 ORDER by reply_id DESC LIMIT 10  here, we can see clearly, actually we use the basic table here, the base table is our section table. The corresponding, will certainly say: the basic table of the amount of data in the future how to ensure its speed and efficiency?   Of course, we have to make this base table the best speed and performance, for example, can be stored in MySQL memory table, or stored in memory, such as memcache memory cache, etc., can be adjusted according to the actual situation.   generally based on the basic table of the table mechanism in SNS, friends, forums and other Web2.0 site is a relatively good solution, in these sites, you can use a single table to save the basic identity and the relationship between the target table. The advantage of saving correspondence with tables is that it is convenient to expand later, and only one table record is added.    " Advantages"Increased deletion of nodes is very convenient, for late upgrade maintenance brings great convenience" Disadvantage"Adding a table or working on a table, or not leaving the database, can create bottlenecks third, based on Hash the table processing of the algorithm  We know that a hash table is a value computed by a particular hash algorithm, which must be unique and can use the computed value to find the desired value, called a Hashtable.   Our hash algorithm in the table is similar to this idea: through a primitive target ID or name through a certain hash algorithm to calculate the table name of the data storage table, and then access the corresponding table.   Continue to take the above bar, each bar has a section name and the section ID, then the two values are fixed, and is unique, then we can consider the two values by doing some operations to get the name of a target table.   Now if we are aiming at our post-paste system, assuming that the system allows up to 100 million data, consider saving 1 million records per table, then the entire system will be able to accommodate no more than 100 tables. According to this standard, we assume that we hash the section ID of the bar, get a key value, this value is our table name, and then access the corresponding table.   We construct a simple hash algorithm:  function get_hash ($id) {     $str = Bin2Hex ($id);      $hash = substr ($str, 0, 4);     if (strlen ($hash) <4) {          $hash = Str_pad ($hash, 4, "0");     }    return $hash;} The   algorithm is basically passing in a section ID value, and then the function returns a 4-bit string, and if the string is not long enough, use 0 to complete the completion.   such as: Get_hash (1), the result of the output is "3100", Input: Get_hash (23819), the result is: 3233, then we have a simple combination of table prefix, we can access the table. Then we need to access the content of the ID 1, OH, the combined table will be: topic_3100, reply_3100, then you can directly access to the target table.   Of course, after using the hash algorithm, some of the data is probably in the same table, this is different from the hash table, hash table is as far as possible to resolve the conflict, we do not need here, of course, also need to predict and analyze table data may be savedName of the table.    If you need to store more data, the same, you can hash the name of the section, such as the above binary conversion to 16, because the Chinese characters are much more than numbers and letters, then the probability of repetition is smaller, but may be combined into more tables, Some other problems must be considered accordingly.   In the final analysis, the use of hash method must choose a good hash algorithm, in order to generate more tables, but the data query faster.    " The advantage hash algorithm directly obtains the target table name, the efficiency is very highBy Disadvantage"Poor extensibility, the choice of a hash algorithm, defines how much data, in the future can only run on this data volume, can not exceed the amount of data, scalability is slightly worse Iv. Other issues 1. Search QuestionsNow that we have a table, we can't search directly on the table, because you can't retrieve dozens of or hundreds of tables that already exist in the system, so the search has to be done with third-party components, such as Lucene, which is a good choice for the site search engine. 2. Table File IssuesWe know that MySQL's MyISAM engine will generate three files per table, *.frm, *. MYD, *. MYI three files, tables are used to save table structure, table data, and table indexes. The number of files under each directory of Linux is best not more than 1000, otherwise the retrieval data will be slower, then each table will generate three files, the corresponding if the table more than 300 tables, then the retrieval is very slow, so this time must be divided, such as in the database separation. Using the underlying table, we can add a new field to hold what data the table holds. Using hash, we must intercept the hash value of the first few to be the name of the database. In this way, the problem is solved in good condition. v. SummaryIn the heavy load application, the database has been a very important bottleneck, must break through, this article explained two kinds of table's way, hoped for many people to have the inspiration function.  Of course, the code and the idea of this article has not been tested by any code, so the design is not guaranteed to be completely accurate and practical, or need the reader in the use of the process of careful analysis and implementation. Article written in a hurry, the quality may not be guaranteed, encountered errors, do not take offense, welcome to criticize, thank you ~~~~! The design idea and realization of the sub-table processing

The design idea and realization of the sub-table processing

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.