Rough Set Theory

Source: Internet
Author: User

Rough Set Theory
In the face of increasing databases, how can people find useful knowledge from these vast data? How can we refine what we have learned? What is a rough line description of a thing? What is a thin line description?

 

 

Rough set theory answers the above questions. To understand the idea of rough set theory, we must first understand what is knowledge? Assume that eight blocks constitute a set, a = {x1, x2, X3, X4, X5, X6, X7, X8 }, each block has a color attribute. Based on different colors, we can divide the stacked block into three categories: r1 = {red, yellow, and blue, then all the red blocks constitute a set of X1 = {x1, x2, X6}, and the yellow blocks constitute a set of X2 = {X3, X4}. The blue blocks are: x3 = {X5, X7, X8 }. According to the color attribute, we divide the block set a (the so-called a Division means that any element in a must belong to and belongs to only one category ), in this case, color attributes are a kind of knowledge. In this example, we can easily see that a division of set a corresponds to a knowledge about the elements in a. If there are other attributes, such as the shape r2 = {triangle, square, circle}, size R3 = {large, medium, small}, then add the R1 attribute to divide a:

A/R1 = {x1, x2, X3 }={ {x1, x2, X6}, {X3, X4}, {X5, X7, X8} (color classification)
A/r2 = {y1, Y2, Y3 }={{ x1, x2}, {X5, X8}, {X3, X4, X6, X7} (shape classification)
A/R3 = {Z1, Z2, Z3} = {x1, x2, X5}, {X6, X8}, {X3, X4, X7} (size classification)

All the above classifications are combined to form a basic knowledge base. What is the concept of this basic knowledge base? Except for red {x1, x2, X6}, large {x1, x2, X5}, and triangle {x1, in addition to this concept, X2} can also be used to express the {x1, x2, X5} triangle {x1, x2 }={ x1, x2}, triangle {x1, x2, X5} round {x1, x2 }={ x1, x2}, small blue circle ({X5, X7, X8} round {X3, X4, x7} highlight {X3, X4, X6, X7 }={ X7}, blue or in the building blocks {X5, X7, X8} highlight {X6, X8} = {X5, x6, X7, X8 }. The concept similar to this can be obtained through the intersection operation. For example, the intersection of X1 and Y1 represents the red triangle. All of these concepts that can be expressed with intersection and addition of the above three basic knowledge (A/R1, A/r2.a/R3) A Knowledge System is recorded as R = R1 1_r2 1_r3. All the knowledge determined by this system is A/R ={{ x1, x2 },{ X3 }, union of {X4}, {X5}, {X6}, {X7}, {X8}, and A/R.

The concept of approximation is considered below. Suppose we have given a sub-set X = {X2, X5, X7} on A. How should we describe it with the knowledge in our knowledge base? Red Triangle? * *** Big circle? None of them. Neither single attribute knowledge nor knowledge that is handed over and combined by several knowledge, nor can this new set X be obtained, so we had to use our existing knowledge to approximate it. That is to say, in all the existing knowledge, find the two most similar to him as the bottom approximation, and the other as the top approximation. So we chose the concept of "Blue-colored big squares or blue-colored small circles": {X5, X7} as the bottom approximation of X. Select "triangle or blue" {x1, x2, X5, X7, X8} as its top approximation. It is worth noting that, the lower approximation set is obtained from the Union of all the sets in the knowledge base that contain X, while the upper approximation is to obtain and obtain the set in the knowledge base that contains x. Generally, we can use the following figure to represent the upper and lower approximation concepts.

In this case, the curve is enclosed in the area X, the blue internal box is the internal reference message, which is the lower approximation, and the green is the border plus the blue part, which is the upper approximation set. Each small square can be regarded as all the divisions made up by the knowledge system on the universe.

The core of the rough set theory is the knowledge, set division, and approximate set mentioned above. Next we will discuss the application of Rough Set in Data Mining in databases. Consider the tables in a database as follows:
Element color shape and size stability
X1 Red Triangle Stability
X2 Red Triangle Stability
X3 yellow circle small unstable
X4 small and unstable yellow circle
X5 blue square is stable
Unstable in X6 red circle
Small unstable x 7 circle
Instability in X8 Blue Box
We can see that this table is a two-dimensional table in the above example, and the last column is our decision attribute, that is, to evaluate the stability of the building blocks. Each row in this table indicates information similar to this: the Red Triangle Blocks are stable, and the small circles are unstable. We can regard all records as the Domain A = {x1, x2, X3, X4, X5, X6, X7, X8 }, any column represents an attribute that constitutes a division of the elements of the universe, and each class of the Division has the same attribute. Attributes can be divided into two categories: condition attributes: color, shape, and size, and decision attributes: are the last column stable? Next, we will consider whether all the condition attributes are useful for decision-making attributes? Considering that all the decision attributes are "stable" sets {x1, x2, X5}, the upper and lower approximation in the Knowledge System A/R is {x1, x2, X5} itself, "unstable" collection {X3, X4, X6, X7, X8}. The upper and lower approximation in Knowledge System A/R is also {X3, X4, X6, X7, x8. This indicates that the knowledge base can well describe this concept. Do you have the following basic knowledge: color, shape, and size are necessary? If we remove the basic color knowledge from the knowledge system, then the knowledge system becomes a/(R-R1) ={{ x1, x2}, {X3, X4, X7 }, {X5}, {X6}, {X8}, and the union of these subsets. If we use this new knowledge system to express the concept of "stability", the upper and lower approximation will still be: {x1, x2, X5}, and the upper and lower approximation of the concept of "instability" will also be {X3, x4, X6, X7, X8}, we can see that the knowledge of expressing stability will not change when removing the color attribute, so the color attribute is redundant and can be deleted. If I want to consider whether the size attribute can be removed? In this case, the knowledge system becomes:
A/(R-R1-R3) = A/r2 = {x1, x2}, {X5, X8}, {X3, X4, X6, X7 }}. Similarly, the upper and lower approximation of "stability" in Knowledge System A/R2 is: {x1, x2} and {x1, x2, X5, X8}, respectively }, it is already different from the upper and lower approximation in the original knowledge system, and the approximate representation of "unstable" also changes, therefore, deleting the attribute "size" affects the knowledge representation, so it cannot be removed. In the same discussion, the "shape" attribute cannot be removed. Finally, we get the simplified Knowledge Base R2 and R3 to get the following decision rules: triangle> stability, square> stability, circle> instability, center circle-> unstable, center square-> unstable. Using the rough set theory, you can further simplify these rules to obtain: Large-> stable, circle-> unstable, medium square> unstable. This is the real useful knowledge contained in the above data table, which is obtained from the automatic learning of the database using the rough set method. Therefore, rough set is an effective method for data mining in databases.

From the above example, we can easily see that, in fact, we only need to input this database into the rough set computing system, instead of providing any prior knowledge, the rough set algorithm can automatically learn the knowledge, this is the root cause of its wide application. In the set theory, such as fuzzy set and extension set, we need to specify the membership function in advance.

At present, rough set theory has been widely used in many fields, such as knowledge discovery, data mining, intelligent decision-making, and electronic control.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.