By dividing the attribute range into intervals, data discretization can be used to reduce the number of given continuous attribute values. The mark of the interval can replace the actual data value. The number of consecutive attributes is replaced with a few interval markers, thereby reducing and simplifying the original data. This leads to a concise, easy-to-use, knowledge-level representation of the results of the excavation. Discretization techniques can be categorized according to how they are discretized, such as whether or not to use class information or according to direction (i.e. Top-down or bottom-up). If the discretization process uses class information, it is called supervised discretization (supervised iscretization), otherwise unsupervised (unsupervised). If a point or a few points (called a splitting point or a cut point) are first identified to divide the entire property range and then recursively repeat the process on the result interval, it is called Top-down discretization or splitting. From the bottom up discretization or merging on the contrary, first of all the continuous values as a possible splitting point, by merging the values of adjacent fields to form an interval, and then recursively apply this process in the result range. A property can be recursively discretized, resulting in hierarchical or multiresolution partitioning of attribute values, called conceptual layering. Conceptual layering is useful for mining multiple layers of abstraction.
For a given numeric property, the concept hierarchy defines a discretization of the attribute. Conceptual hierarchies can be used to classify data by collecting higher-level concepts (such as youth, midlife, or old age) and substituting them for lower-level concepts, such as age values. This data generalization, although the details are lost, but the generalization of the data is more meaningful and easier to explain.
This helps to consistently represent the data mining results of multiple mining tasks that are commonly required. In addition, there is less and more I/O operations required to mine the reduced data than for large, not-generalized dataset mining. Because of this, discretization technology and concept layering are used as preprocessing steps before data mining, not in the mining process. The concept layering example of attribute price is given in Figure 2-22. Multiple conceptual hierarchies can be defined for the same attribute to suit the needs of different users.
A conceptual layering of attribute price, where intervals ($X. $Y] represent $x (excluding) to $y (including) intervals for user or domain experts, it can be tedious and time-consuming to define conceptual hierarchies artificially. Fortunately, some discretization methods can be used to automatically generate or dynamically refine the conceptual layering of numeric attributes. In addition, the hierarchical structure implication of many classification attributes is in the database schema and can be defined automatically at the pattern definition level.