Cloud computing design model (12)-index table model

Source: Internet
Author: User
Cloud computing design model (12)-index table model


Fields such as data storage that are frequently referenced by query conditions that have created indexes. This mode allows applications to quickly locate data and retrieve data from data storage to improve query performance.

Background and problems


Many data storage uses primary keys to organize data into a collection of entities. Applications can use this key to search for and retrieve data. Figure 1 shows an example of retaining customer information in a data storage area. The primary key is the customer ID.

Figure 1-customer information organized by primary key (customer ID)


The primary key is valuable for querying data based on the value of this keyword. Applications may not be able to use the primary key to retrieve data based on other fields. For example, an application cannot use the primary key of the customer ID to retrieve the customer. If it specifies the value of some other properties referenced by the customer, such as in which the customer is located, the standard query data is complete. To execute a query, if this may require an application to obtain and check records of each customer, this may be a slow process.

Many relational database management systems support secondary indexes. A secondary index is a separate data structure organized by one or more non-primary (secondary) key fields, which indicates that data of each index value is stored. In a project with a second index, the value of the second key of the sorting method is usually used to quickly search data. These indicators are usually automatically maintained by the database management system.

You can create any number of secondary metrics to support different queries in your application. For example, in a relational database, if the customer ID is a primary key table, it may also be helpful to supplement the auxiliary index in the town field if the application frequently finds the customer in the town where the customer lives.

However, although Level-2 indicators are a common feature of relational systems, most NoSQL data storage for cloud applications do not provide the same functionality.

Solution


If data storage is not supported ?? Secondary indexes, you can manually follow them by creating your own index tables. The index table organizes data by the specified key. The three policies are usually used to build an index table, depending on the number of secondary indexes required and the nature of the queries executed by the application:
? Each index table of duplicate data is organized by different keys (completely nonstandard. Figure 2 shows that the organization of the index table includes the customer information with the same city name and last name:

Figure 2-the customer data of the index table that executes the secondary indicator. Data is copied to each index table.


If the data queried by each key is relatively static during the comparison, this policy may be appropriate. This method is useful if the data is more dynamic and the processing overhead of each index table may become too large. In addition, if the data size is very large, the amount of space required to store duplicate data will be significant.

? Create an index table for different key organizations and reference the original data by using a primary key instead of repeating it. As shown in figure 3, the original data is called a fact table:

Figure 3-the customer data of the index table that executes the secondary indicator. This data is referenced by each index table.


This technology can save space and reduce the overhead of maintaining duplicate data. The disadvantage is that an application performs two search operations by using the second key to find data (the data in the index table of the primary key is found, and then find the data in the fact table by using the primary key ).

? Create a part of the index table organized by different buttons for repeated frequently Retrieved fields. Reference original data to access less frequently accessed fields. Figure 4 shows this structure.

Figure 4-the customer data of the index table that executes the secondary indicator. Frequently accessed data is repeated in each index table.


With this technology, you can strike a balance between the first two methods. You can quickly retrieve data that is frequently used for query by using a single query. The space and maintenance overhead are different, and the whole dataset can be copied.


If the application frequently queries data through a combination of specified values (for example, "finding all customers living in Raymont and having Smith's surname "), the items in the key index table can be used as a cascade city attribute and surname attribute, as shown in figure 5. The keys are sorted by towns, then, records with the same town value are named.

Figure 5-index table based on composite primary key


The index table can speed up data query for shards, and the Shard key hash is particularly useful. Figure 6 shows an example where the Shard key is a hash of the customer ID. The index table can organize data by non-hash values (city and name) and provide the hash partition key as the data to be searched. In this way, you can save applications that calculate hash keys repeatedly (which can be expensive operations). If the data to be retrieved falls within a certain range or the data to be read, to facilitate non-hash keys. For example, "finding all customers living in redimon" can be done by locating matching items in the index table (all of which are stored in a continuous block ), use the shard key stored in the index table as soon as possible based on the referenced customer data.

Figure 6-the index table provides quick search for sharded data

Problems and precautions


Consider the following when deciding how to implement this mode:
? The overhead of secondary indexes may be significant. You must analyze and understand the queries used by your application. Only create index tables that are often used. Do not speculate on creating indexed tables to support applications that do not execute queries, or an application that only runs occasionally.
? Significant overhead is added to the working conditions required to store the data copied in the index table and maintain multiple copies of the data.
? Execute an index table as a standardized structure, reference the applications that may be required by the original data, and perform two search operations to find the data. The first operation searches for the index table to retrieve the primary key, and the second operation uses the primary key to retrieve data.
? If the system contains a large number of index tables in a very large dataset, it may be difficult to maintain consistency between the index table and the original data. It is possible to design applications around the final consistency model. For example, to insert, update, or delete data, an application can send a message to a queue, and allow an independent task to perform operations and maintain an index table that references the data that is not synchronized. For more information about achieving eventual consistency, see data consistency primer.


Note:

Microsoft Azure storage table supports changing the data stored in the same partition when the transaction is updated (referred to as the transaction of the entity group ). If you can store data in a fact table and one or more index tables in the same partition, you can use this function to ensure consistency.


? The index table can be partitioned or sharded by itself.

When to use this mode


This mode is used to improve query performance. When an application often needs to use a primary (or sub-database) key other than one key to retrieve data.

This mode may be inappropriate:
? Data is unstable. The index table may become obsolete quickly, making it invalid, or making it more cost-effective than using it.
? The field selected as the secondary key in the index table is very unauthenticated and can only have a set of small values (such as gender ).
? The selection of data values as the field balance of the secondary key in an index table is highly skewed. For example, if 90% of records contain a field of the same value, then creating and maintaining an index table to search for the data based on the field can impose a greater overhead than through the data scanning order. However, if the query is very frequent for the remaining 10% value, this index can be useful. You must understand the question that your application is being executed and how they are often executed.

Example


Applications running on the cloud with Azure storage tables provide a highly scalable key/value data storage. The application stores and retrieves data values by specifying a key. The data value of a data item can contain multiple fields, but the structure of a data item is non-transparent table storage, which only processes a data item as a byte array.

Azure storage tables also support sharding. A partition key consists of two components: a partition key and a row key. Data items with the same partition key are stored in the same partition (fragment), and projects are stored in a sub-database to sort keys. Table store optimization is used to query the row key values in the continuous range of the retrieved data drop partition. If you are building a cloud application for information stored in Azure tables, you should organize your data to consider this feature.

For example, consider applications that store information about movies. Applications often query movies by genre (action movies, documentaries, history, comedy, drama, and so on ). You can use the type as the partition key and specify the movie name as the row key to create each type of partition in the Tianqing table, 7.

 

Figure 7-movie data stored in Azure Table, movie names sorted by genre


This method is not very effective, if the application still needs to query movies by actors. In this case, you can create a separate Azure table as an index table. The partition key is the name of the movie. Data for each actor is stored in a separate partition. If a movie star has more than one actor, the same movie will appear in multiple partitions.

You can repeat the movie data of the values stored in each partition using the first method described in the solution section above. However, it is likely that each film will be repeated several times (for each actor), so it may be more effective to partially nonstandard, to support the data of the most common queries (such as the names of other actors), and to implement an application that includes the complete information necessary to find in the genre partition, partition key to retrieve any remaining details. This method is described in the third item in solution section. Figure 8 describes this method.

Figure 8-image data of the actor partition as the index table

MSDN: http://msdn.microsoft.com/en-us/library/dn589791.aspx

Cloud computing design model (12)-index table model

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.