Introduction:
Rough Set is Z. professor pawaks proposed a set of theories to study the expression, learning, and Induction of incomplete data and inaccurate knowledge. From a mathematical perspective, rough set is a set of research; from the programming point of view, Rough Sets study matrices, but some special matrices. From the perspective of artificial intelligence, Rough Sets study decision tables.
Concept:
- Universe U: Actually a set in mathematics.
- Knowledge: ability to classify objects. The objects here refer to any entity, generally called the universe. Is any subset of U.
- Attribute r = knowledge r = equivalence relation R = classification: the attribute is the column in the table, and the knowledge is the term in artificial intelligence. The equivalence relation is a mathematical word, classification is a concept in data mining. In fact, the above four are the same thing.
- Knowledge Base: The classification family on U is called a knowledge base.
- Knowledge equivalence: Ind (p) = Ind (Q), indicating that P is equivalent to Q.
Rough Set is based on the classification mechanism. It understands classification as an equivalent relation in a specific space, and the equivalent relation constitutes the division of the space.
The main idea of rough set theory is to use a known knowledge base to portray inaccurate or uncertain knowledge with knowledge in a known knowledge base (approximate.
The most significant difference between this theory and other theories that deal with uncertainty and inaccuracy is that it does not need to provide any prior information other than the data set to be processed, therefore, the description or handling of the uncertainty of the problem can be objectively stated.
Basic concepts:
Given that u is a non-empty finite Universe, R is a binary equivalent relation on U. R is called a non-deterministic relation, and a = (u, R) is called an approximate space.
For any subset of X on universe U, X may not be expressed accurately using the knowledge in the knowledge base, that is, X may be an undefinable set, in this case, we will use a pair of bottom approximation APR and top approximation APR of X to "Approximate" the description:
The approximate APR is the largest definable set in X in A, and the approximate APR is the minimum definable set of X in.
Therefore, when the approximate APR is equal to the approximate Apr, X can be defined. Otherwise, X cannot be defined. In this case, X is called rough set.
The approximate quality definition of a reflects the percentage of Knowledge X that must be in the knowledge base in the existing knowledge.
X's attention measure on a is defined as follows, which reflects the degree of incompleteness of knowledge.
X's approximate precision of a reflects the degree of understanding of X based on existing knowledge.
Knowledge Expression in Rough Set Theory:
The Knowledge Expression of rough set theory is generally in the form of information tables or information systems. It can be expressed as k = (u, A, V, p) of a four-element ordered group ). Where U is the whole of the object, that is, the universe; A is the whole of the attribute; V is the value of the attribute; P is an information function, reflecting the complete information of object X in K.
The information system is similar to the relational database model expression. Decision-free data analysis and decision-making data analysis are two main applications of Rough Set Theory in data analysis. Rough set theory provides the knowledge reduction and core methods to analyze redundant attributes in information systems.
If the set B belongs to a and has no redundant attributes, the conciseness of B is a or at, which is marked as red (AT). The intersection of the conciseness in at is called the at core, it is recorded as core (AT). Generally, the reduction of attributes is not unique, but the core is unique.
Knowledge Expression without decision-making:
S = (u, A, V, p), where u = {x1, x2 ,..., X8}, V1 = v2 = V3 = {1, 2, 3}, V4 = {1, 2}. For information function P, see the following:
U |
C1 |
C2 |
C3 |
C4 |
X1 |
1 |
1 |
1 |
1 |
X2 |
1 |
2 |
2 |
1 |
X3 |
1 |
1 |
1 |
1 |
X4 |
1 |
2 |
2 |
1 |
X5 |
2 |
2 |
1 |
1 |
X6 |
2 |
2 |
1 |
1 |
X7 |
3 |
3 |
3 |
2 |
X8 |
3 |
3 |
3 |
2 |
You can know from:
U/C1 = {x1, x2, X3, X4}, {X5, X6}, {X7, X8 }}
U/C2 = {x1, X3}, {X2, X4, X5, X6}, {X7, X8 }}
U/C3 = {x1, x2, X5, X6}, {X2, X4}, {X7, X8 }}
U/C4 = {x1, x2, X3, X4, X5, X6}, {X7, X8 }}
U/C = {x1, X3}, {X2, X4}, {X5, X6}, {X7, X8 }}
Compressed information table
U/C |
C1 |
C2 |
C3 |
C4 |
{X1, X3} |
1 |
1 |
1 |
1 |
{X2, X4} |
1 |
2 |
2 |
1 |
{X5, X6} |
2 |
2 |
1 |
1 |
{X7, X8} |
3 |
3 |
3 |
2 |
The following describes how to extract rules from a decision table:
(1) Delete objects with the same information and only one compressed information table in the decision table, that is, delete redundant cases;
(2) Delete redundant attributes
(3) Delete unnecessary attribute values for each object and its information.
(4) Finding the minimum reduction
(5) Obtain logical rules based on the minimum reduction.
Theoretical Research on rough set:
At present, the research on rough set theory is mainly focused on:
(1) Promotion of Rough Set Models
Currently, there are two main methods: constructive method and algebraic method.
- Constructive Method: The main idea is to study Rough Sets and approximate operators from the given approximate space. The problem of this method is often due to the fact that the model has a strong application value. The main drawback of this method is that it is not easy to understand the algebraic structure of the approximate operator.
- The algebraic method is also called the operator method. Its obvious advantage is that it can deeply understand the Algebra Structure of the approximate operator. Its disadvantage is that it is not applicable enough.
(2) Theoretical Study on Uncertainty
Uncertainty in rough set theory is mainly caused by two reasons: first, it directly comes from the binary relationship in the universe and its knowledge modules, that is, the approximate space itself. Another reason is that the rough and approximate boundary in a given domain. When the boundary is empty, the knowledge is completely definite. The larger the boundary, the more rough or fuzzy the knowledge.
(3) Research on theories related to other methods for dealing with uncertainty
There are two types of knowledge in the Knowledge Base: the descriptions of all objects in a class library are completely known; the descriptions of objects in another class library are only known, that is, the knowledge in the knowledge base is uncertain.
The fuzzy set and rough set theory both promote the classic Set Theory in dealing with uncertainty and inaccuracy issues, but the fuzzy set is described by approximate Lei degree of the set, rough Set describes a pair of upper and lower approximation about an available knowledge base;
From the relationship between the set objects, Fuzzy Sets emphasize the pathological definition of the set boundary, while Rough Sets emphasize the non-discrimination between objects. From the perspective of the research object, fuzzy Sets study Lei relationships between different objects of the same class, while Rough Sets study the set relationships of objects of different classes, focusing on classification.
Most of the Lei functions in a fuzzy set are provided by experts based on experience, so they have a strong subjective will. The rough Lei functions in a rough set are directly obtained from the analyzed data, very objective.
(4) Algorithm Research
Effective algorithms in Rough Set theory are mainly concentrated on incremental algorithm for rule export, heuristic algorithm for reduction, basic parallel algorithm for Rough Set, and Neural Network and Genetic Algorithm related to rough set.
(5) connection with other mathematical theories
From the operator's point of view, rough sets are closely related to them, including Topo space, mathematical logic, modal logic, lattice and Boolean algebra, and operator algebra.
From the perspective of construction and collection, it is closely related to probability theory, fuzzy mathematics, evidence theory, graph theory, and information theory.
Uncertainty is inherent in the objective world:
(1) randomness: uncertainty of random phenomena
(2) ambiguity: uncertainty of fuzzy concepts
(3) Adequacy: uncertainty of knowledge and concepts in Information Systems
Why is rough set used?
(1) Knowledge adequacy is caused by insufficient classification capabilities of human or System Intelligent bodies.
(2) We can't wait for a ratio to reproduce objects in the real world without difference, but it can only be an approximate degree. This constitutes the granularity characteristic of the knowledge or concept that expresses the real world, that is, attention.
(3) It is in line with the conventional rules for people to deal with unclear problems and to deal with unclear phenomena with incomplete information or knowledge.
Comparison between Fuzzy Sets and rough sets:
(1) the theory of fuzzy sets uses the membership function to deal with ambiguity. The basic membership is given by experience or field experts, so it is quite subjective.
(2) Rough Set refers to the boundary region where unidentifiable individuals belong, and this boundary region is defined as the difference set of the approximate set under the upper approximate set. Rough Sets have definite mathematical formulas, which are completely determined by data, so they are more objective.
The relationship between various mathematical theories
(1) rough set theory and fuzzy set theory are not competitions, but supplements fuzzy sets.
(2) Relationship between Rough Sets and Dempster-Shafer theory. Dempster-Shafer theory uses the reliability function as the main tool, while rough sets use the lower approximation set and the upper approximation set as the processing tool.
Representative works:
(1) pawaks, Z., 1982. Rough Sets. International Journal of Computer and Information Sciences,-356
(2) pawaks, Z,. 1991. Rough Sets-theoretical aspect of reasoning about data. Kluwer Academic Publishers.
Features of rough set theory:
(1) Rough Set theory assumes that knowledge is an ability to classify objects.
(2) One of the main advantages of rough set theory is that no preparation or additional data information is required.
(3) It can be applied to data reduction, feature extraction, feature extraction, decision-making rules, and pattern recognition.
Basic concepts of rough set:
(1) Information System/decision-making system: Information/demo-system
(2) indiscernibility
(3) set approximation: Set Approximation
(4) reduction and core: reducts and Core
(5) Rough membership: Rough membership
(6) attribute dependency: dependency of attributes
Information System)
(1) In form, the triplet S = (u, A, V, f) is an information system. Where
U: indicates the domain where the object is not a null finite set.
A: A non-empty finite set of attributes
V: value range of attribute
F: it is an information function.
The following is an example:
|
Age |
Lems |
X1 |
16-30 |
50 |
X2 |
16-30 |
0 |
X3 |
31-45 |
1-25 |
X4 |
31-45 |
1-25 |
X5 |
46-60 |
26-49 |
X6 |
16-30 |
26-49 |
X7 |
46-60 |
26-49 |
Decision Table ):
(1) A decision table is a special and important information system.
(2) Set S = (u, A, V, F) to an information system. If a = C and D, C cross d = NULL, C is called a condition attribute set, D is the decision property set.
(3) Information System S with condition attribute set and decision attribute set is called a decision table.
Example:
|
Age |
Lems |
Walk |
X1 |
16-30 |
50 |
Yes |
X2 |
16-30 |
0 |
No |
X3 |
31-45 |
1-25 |
No |
X4 |
31-45 |
1-25 |
Yes |
X5 |
46-60 |
26-49 |
No |
X6 |
16-30 |
26-49 |
Yes |
X7 |
46-60 |
26-49 |
No |
Reference books:
- <Rough Set Theory and Its Application>Zeng huanglinEditedChongqing University Press
- <Rough set theory and method>Zhang Wenxiu and others compile the Science Press