A summary of the main methods of spatial data mining _ data Mining

Source: Internet
Author: User

Spatial data mining refers to the theory, methods and techniques of extracting the hidden knowledge and spatial relations which are not clearly displayed from the spatial database and discovering the useful characteristics and patterns. The process of spatial data mining and knowledge discovery can be divided into several steps: Data preparation, data selection, data preprocessing, data reduction or data transformation, determination of data mining target, determination of knowledge discovery algorithm, data mining, pattern interpretation, knowledge evaluation, etc., and data mining is only one of the key steps. But for the sake of simplicity, spatial data mining is often used instead of spatial data mining and knowledge discovery.


The most common methods of spatial data mining are:
1. A method based on probability theory. This is a method of mining spatial knowledge by calculating the probability of uncertain attributes, and the discovered knowledge is usually expressed as a conditional probability that a hypothesis is true under given conditions. When the error matrix is used to describe the uncertainty of remote sensing classification result, this conditional probability can be used as background knowledge to express the confidence degree of uncertainty.
2. Spatial analysis methods. Refers to the use of integrated attribute data analysis, topological analysis, analysis models and methods such as buffer analysis, density analysis, distance analysis, overlay analysis, network analysis, terrain analysis, trend surface analysis, prediction analysis, etc., are used to discover the association rules of object in space, neighboring and symbiosis, or to excavate the shortest path between targets, Knowledge of the optimal path. At present, the commonly used spatial analysis methods include exploratory data analysis, spatial neighbor relation mining algorithm, exploratory spatial analysis method, exploratory inductive learning method, image analysis method and so on.
3. Statistical analysis methods. The method of using the limited information of spatial object and/or uncertainty information to analyze and estimate the characteristics of spatial object and the statistic law. It mainly uses the spatial covariance structure, the mutation function or its correlation with the autocorrelation variable or the local variable value similarity degree to realize the spatial data mining which contains the uncertainty.
4. Inductive learning methods. That is, in a certain knowledge background, the data is summarized and synthesized, in the spatial database (data Warehouse) to search and mining the general rules and patterns of methods. There are many algorithms for inductive learning, such as the famous C5.0 decision tree algorithm proposed by Quinlan, the attribute-oriented inductive method proposed by Han Jiawei, and the inductive method based on spatial attribute proposed by Pei Jian and others.
5. Spatial association rule Mining method. The algorithm for searching and mining the association relationship between spatial objects (and their attributes) in the spatial database (data Warehouse). The most famous algorithm for association rules mining is the Apriori algorithm proposed by Agrawal, and the mining algorithm of Multi-level Association rules, Xu Longfei and so on, which are proposed by Cheng Jihua and so on.
6. Cluster analysis method. That is, according to the characteristics of the entity to cluster or classify it, and then discover the entire spatial distribution of data sets and typical patterns of methods. The commonly used clustering methods include K-mean, K-medoids method, Ester and so on, which are based on R-tree data focusing method and the algorithm of discovering convergence affinity and common features, Zhou Chenghu, and so on, and the spatial-temporal data segmentation clustering model based on information entropy.
7. Neural network method. The adaptive nonlinear dynamic system is realized by the network of a large number of neurons, and it has the functions of distributed storage, associative memory, large-scale parallel processing, self-learning, self-organization and self-adaptive, and can be used to excavate the classification and clustering knowledge and feature in spatial data mining.
8. Decision Tree Method. That is, according to different characteristics, a tree-type structure is used to represent the classification or decision sets, and then to produce rules and discovery laws. The basic steps of spatial data mining using decision tree method are as follows: firstly, the test function is generated by the entity set of training space, then the branch of the decision tree is established according to the different values, and the lower nodes and branches are formed in each subset, and then the decision tree is pruned. Transform a decision tree into a rule based on which new entities are categorized.
9. Rough set theory. An Intelligent Data decision analysis tool, which is based on the upper approximation set and the lower approximation set, which is used to deal with imprecise, uncertain and incomplete information, is more suitable for spatial data mining with attribute uncertainty.
10. A method based on fuzzy set theory. This is a series of methods to analyze and deal with the practical problems by using fuzzy set theory to describe the research object with uncertainty. The method based on fuzzy set theory has been widely used in fuzzy classification of remote sensing images, fuzzy query of GIS, uncertainty expression and processing of spatial data.
11. Spatial characteristics and trends in the method of exploration. This is a spatial data mining algorithm based on neighborhood graph and Neighborhood path concept, which extracts spatial rules by the difference of relative frequencies of different types of attributes or objects.
12. A method based on cloud theory. Cloud theory is a new theory of analyzing uncertain information, which is composed of cloud model, uncertainty inference and cloud transformation. The spatial data mining method based on cloud theory combines qualitative analysis and quantitative calculation to deal with the uncertain attributes of spatial objects, such as integration randomness and fuzziness, which can be used in spatial association rules Mining and Spatial database uncertainty query.
13. A method based on evidence theory. Evidence theory is a theory that deals with uncertain information by means of the confidence function (the lowest degree of evidence that has been supported by the hypothesis) and the possible function (measuring the highest degree of the existing evidence cannot negate the assumption) and can be used for spatial data mining with uncertain attributes.
14. Genetic algorithm. This is a simulation of biological evolution process algorithm, the solution of the problem can be efficient parallel global search, the search process can automatically acquire and accumulate knowledge about search space, and through the adaptive mechanism to control the search process to obtain the optimal solution. Many problems in spatial data mining, such as classification, clustering, prediction and other knowledge acquisition, can be solved by genetic algorithm. This method has been applied to the feature discovery in remote sensing image data.
15. Data visualization method. This is a way to display spatial data through visualization techniques to help people use visual analysis to find spatial knowledge of structures, features, patterns, trends, anomalies, or related relationships in the data. To ensure that this approach works, you must build powerful visualization tools and auxiliary analysis tools.
16. Computational Geometry method. This is a method of using computer program to compute the Voronoi graph of Plane point set, and then discover the space knowledge. The Voronoi graph can be used to solve problems such as spatial topological relations, multi-scale representation of data, automatic synthesis, spatial clustering, sphere of influence of space object, location of public facilities, and determination of shortest path.
17. Spatial online data mining. This is a web-based verification-type space for data mining and analysis tools. It is based on multidimensional view, emphasizing execution efficiency and timely response to user commands, generally using spatial data Warehouse as direct data source. This method completes the extraction of information and knowledge through data analysis and report module query and analysis tools (such as OLAP, decision analysis, data mining, etc.) to meet the needs of decision making.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.