Database parsing: Data discretization and conceptual layering

Source: Internet
Author: User

By dividing the attribute range into intervals, data discretization can be used to reduce the number of given continuous attribute values. The mark of the interval can replace the actual data value. The number of consecutive attributes is replaced with a few interval markers, thereby reducing and simplifying the original data. This leads to a concise, easy-to-use, knowledge-level representation of the results of the excavation. Discretization techniques can be categorized according to how they are discretized, such as whether or not to use class information or according to direction (i.e. Top-down or bottom-up). If the discretization process uses class information, it is called supervised discretization (supervised iscretization), otherwise unsupervised (unsupervised). If a point or a few points (called a splitting point or a cut point) are first identified to divide the entire property range and then recursively repeat the process on the result interval, it is called Top-down discretization or splitting. From the bottom up discretization or merging on the contrary, first of all the continuous values as a possible splitting point, by merging the values of adjacent fields to form an interval, and then recursively apply this process in the result range. A property can be recursively discretized, resulting in hierarchical or multiresolution partitioning of attribute values, called conceptual layering. Conceptual layering is useful for mining multiple layers of abstraction.

For a given numeric property, the concept hierarchy defines a discretization of the attribute. Conceptual hierarchies can be used to classify data by collecting higher-level concepts (such as youth, midlife, or old age) and substituting them for lower-level concepts, such as age values. This data generalization, although the details are lost, but the generalization of the data is more meaningful and easier to explain.

This helps to consistently represent the data mining results of multiple mining tasks that are commonly required. In addition, there is less and more I/O operations required to mine the reduced data than for large, not-generalized dataset mining. Because of this, discretization technology and concept layering are used as preprocessing steps before data mining, not in the mining process. The concept layering example of attribute price is given in Figure 2-22. Multiple conceptual hierarchies can be defined for the same attribute to suit the needs of different users.


A conceptual layering of attribute price, where intervals ($X. $Y] represent $x (excluding) to $y (including) intervals for user or domain experts, it can be tedious and time-consuming to define conceptual hierarchies artificially. Fortunately, some discretization methods can be used to automatically generate or dynamically refine the conceptual layering of numeric attributes. In addition, the hierarchical structure implication of many classification attributes is in the database schema and can be defined automatically at the pattern definition level.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.