SSAs series-dimension optimization, "Bush attribute relationships", and concepts of natural and non-natural hierarchies

Source: Internet
Author: User

Dimension optimization In the process of SSAs development and design, dimension optimization is very important because it plays a very important role in the process of SSAs analysis service performance optimization. Generally, the first step to optimize the performance of a cube is to view the dimension and observe the design of the attribute relationship in the dimension. There are two most important Optimization Methods for reference:
    • Confirm the property relationship
    • Effectively use user-defined hierarchies
Define Property Relationships

The property relationship defines the dependency between properties. For example, if a has an associated property B, it is a-> B. For example, if the current city is Seattle, the State must be Washington. generally, the relationship between attributes may be obvious or not obvious in the original dimension table, but it can still be used for performance optimization. For example, by default, all attributes are associated with the key. In this case, the attribute relationship is called "Bush attribute relationship"-which can be understood as dense, this association is similar to that of a crude bush. In this bush attribute relationship, all trunks are expanded from key attributes, but all of them have ended with their own attributes. This is the "Bush attribute relationship ".

 

 

Then, we can redefine the relationship between attributes based on the logical association between data to optimize performance. In this case, the model name, as shown in, defines the product line and subcategory (that is, knowing which model is, you must know which product line and subcategory it belongs ), the subcategory definition determines the category (that is, to know which kind of subcategory it belongs to) or, this subcategory will only appear in one category and will not appear in both category ). Attributes like this are generally defined as many-to-one or one-to-one associations-

The relationship between attributes can be designed to improve performance in two aspects:

    • Index construction and cross-attribute query do not depend on key attribute.
    • Aggregation on attributes can be reused in the query and related attributes.

You can compare the two images above. For example, you need to obtain the attributes subcategory and category, including performing aggregation at last. In the first property relationship, there is no explicit relationship definition between subcategory and category. The query engine must first query which product belongs to which subcategory, then determine which category the product belongs to before determining the association between subcategory and category. Biwork Here is a joke. For example, Zhang San's record contains a father named Li Si and a grandfather named Wang Wu (of course, Zhang San, the key behind Li Si and Wang Wu must be unique ), to figure out the relationship between Li Si and Wang Wu, or how to find his father through Li Si, he must first find his son Zhang San through Li Si, and then find his grandfather Wang Wu through Zhang San, in this way, we know that Wang Wu is the father of Li Si.

In this way, the relationship between attributes is determined. For example, in Office EXCEL, data is usually aggregated and viewed through a hierarchy. If the dimension is large, this process of searching for associations and determining the associations between the lower and lower layers will consume a lot of time. On the contrary, if the association between attributes can be redefined, internal indexes are built during the process phase, and the relationship between these indexes is maintained in the index information, therefore, the analysis service can easily know which category is associated with which subcategory, which greatly improves the efficiency.

At the same time, when defining an attribute relationship, you also need to consider whether the relationship between attributes is flexvisible (variable) or rigid (fixed). The default value is flexible.

Flexible means that when dimensions are updated, the relationship between attributes is variable. For example, if I used to work in Shanghai and now in Beijing, the relationship between customer and city is variable. In addition, when dimensions are incrementally updated, the previous aggregation is deleted and recalculated. However, if only the new members are added, the existing aggregation will not be deleted.

Rigid means that when dimensions are updated, the relationship between attributes does not change, or the commitment does not change. If a change occurs, an error occurs during the incremental processing process. For example, the month-> year relationship is fixed by the rigid.

However, it is only a method to define a property relationship, whether it is flexible or rigid, that has no impact on query performance. This design ensures that the relationship between two attributes is immutable for example, and should be consistent with our pre-design. If an error occurs, it means that some data changes have broken our pre-Design and should attract our attention.

Effective use of hierarchies

If the attributes are only kept in the default Attribute Hierarchy, they will not be actively used in the aggregation design phase and will not be truly aggregated. In the query phase, if these attributes are referenced, data is also summarized through the primary key attribute. In this way, the benefits of aggregation are not utilized, and the performance of these attribute hierarchies in queries will be very slow.

To improve query performance, the candidate for the aggregation attribute is usually configured through the aggregation usage attribute (this will be introduced later ). Before modifying the aggregation usage attribute, you must also consider using a custom attribute hierarchy.

Analytics services have two types of custom attribute hierarchy: natural and unnatural hierarchy. They have different design and performance features:

Natural hierarchy natural hierarchy-all attributes at all levels in the natural hierarchy, they are directly or indirectly associated with all other attributes from the top layer of the hierarchy to the bottom layer.

Unnatural hierarchy-attributes of at least two consecutive hierarchy. This type of hierarchy is usually used to create a drill-down navigation path for common attributes, but it does not have the "natural level" feature. For example, users may need to browse data through a hierarchical structure consisting of gender and education, but gender and education do not have such natural associations.

From the perspective of performance, the performance of natural hierarchies is very different from that of non-natural hierarchies. In the natural hierarchy, the hierarchy tree will be materialized or disk-based materialized during Hierarchical Storage. In addition, all attributes involved in the natural hierarchy will automatically become candidate attributes for aggregation. (For more information about materialized and other concepts, see index view. This concept is also available in index view. You can understand how to store data like a table and save the aggregation results and hierarchies to the disk ).

Non-natural hierarchies are not disk-based, and attributes in the structure are not automatically selected as candidate attributes for aggregation. Although there is no such natural association relationship, from the user's point of view, it is very convenient to have a hierarchical path for drilling, in addition, the navigation function in mdx can also facilitate navigation and computing, so it can still play a positive role.

 

For more bi articles, see the Bi series Essays (SSIS, SSRS, SSAs, MDX, SQL Server)

If you think this article is helpful to you, please help with recommendations so that others can quickly see these articles in the recommendation bar of the biwork blog.

 

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.