Summary of main spatial data mining methods

Source: Internet
Author: User

Spatial Data Mining refers to the process of extracting hidden knowledge and spatial relationships from spatial databases and discovering useful Theories, Methods, and technologies of features and patterns. The process of spatial data mining and knowledge discovery can be roughly divided into the following steps:Data preparation, data selection, data preprocessing, data reduction or data transformation, determination of Data Mining targets, and determination of knowledge discoveryAlgorithmData mining, mode interpretation, and knowledge EvaluationData Mining is only a key step. However, for simplicity, spatial data mining is often used to replace spatial data mining and knowledge discovery.

Common spatial data mining methods include:
1. method based on probability theory . This is a method to mine spatial knowledge by calculating the probability of uncertainty attributes. The knowledge found is usually expressed Probability of a certain hypothesis as a true condition under a given condition . When an error matrix is used to describe the uncertainty of the remote sensing classification results, this conditional probability can be used as the background knowledge to represent the confidence level of uncertainty.
2. Spatial Analysis . It refers to the analysis model including comprehensive attribute data analysis, topology analysis, buffer analysis, density analysis, distance analysis, stacked analysis, network analysis, terrain analysis, trend surface analysis, and prediction analysis. method, used to find the target in space Join, adjacent, and symbiotic association rules , Or mine Knowledge about the shortest path and optimal path between targets . Currently, common spatial analysis methods include probe data analysis, space adjacent relationship mining algorithms, probe spatial analysis methods, probe inductive learning methods, and image analysis methods.
3. Statistical analysis methods . It refers to a method that uses the limited information and/or uncertainty information of a spatial object for statistical analysis to evaluate and predict the characteristics of spatial object attributes, statistical rules, and other knowledge. It mainly uses spaceThe degree of similarity between the self-Covariance Structure, variant function, or the self-Association variable or local variable value related to it Implement spatial data mining that includes uncertainty.
4. inductive learning method . That is, under a certain knowledge background, data is summarized and integrated to search for and mine general rules and patterns in spatial databases (data warehouses. There are many inductive learning algorithms, for example, the famous c5.0 decision tree algorithm proposed by Quinlan, the attribute-oriented induction method proposed by Professor Han Jiawei, and the spatial attribute-based induction method proposed by Xiao Jian and others.
5. Spatial Association Rule Mining Method . This is an algorithm used to search for the association between spatial objects (and their attributes) in a spatial database (data warehouse. The most famous association rule mining algorithm is the Apriori algorithm proposed by Agrawal; in addition, there are also multi-level association rule mining algorithms proposed by Cheng Jihua and Xu longfei, and other generalized association rule model mining methods.
6. clustering analysis method. That is, clustering or classification is performed based on the characteristics of an object to discover the spatial distribution of a dataset and the typical pattern. Common clustering methods include K-mean, k-medoids method, Ester method, R-tree-based data focus Method, algorithm for discovering aggregated closeness and common features, Zhou chenghu algorithm, and other time-space data segmentation clustering based on information entropy. model.
7. neural network method. That is, an adaptive nonlinear dynamic system is realized through a network composed of a large number of neurons, and it has functions such as distributed storage, Lenovo memory, large-scale parallel processing, self-learning, self-organization, and self-adaptation; spatial data mining can be used to mine classification, clustering knowledge, and features.
8. decision tree method. That is, a tree structure is used to represent classification or decision sets based on different features to generate rules and discovery rules. The basic steps for spatial data mining using the decision tree method are as follows: First, use the entity set of the Training space to generate a test function, and then create a branch of the decision tree based on different values, create lower-Layer Nodes and branches in each branch subset to form a decision tree. Then, perform pruning on the decision tree to convert the decision tree into rules for classifying new entities.
9. Rough Set theory. A rough set composed of an upper approximation set and a lower approximation set. And then, based on this, it is suitable for spatial data mining based on Attribute uncertainty.
10. Methods Based on Fuzzy Set Theory . This is a series of research objects that use fuzzy set theory to describe uncertainty and analyze and handle actual problems. The method based on fuzzy set theory has been widely used in the fields of fuzzy classification, GIS fuzzy query, spatial data uncertainty expression and processing of remote sensing images.
11. spatial features and Trend Detection Method . This is a spatial data mining algorithm based on the concept of neighbor map and neighbor path. It extracts spatial rules by comparing different types of attributes or the relative frequency of objects.
12. cloud-based methods . Cloud theory is a new theory for analyzing uncertain information. It consists of three parts: cloud model, uncertainty reasoning, and cloud transformation. The spatial data mining method based on cloud Theory combines qualitative analysis and quantitative computing to process the uncertainty attributes that combine randomness and ambiguity in spatial objects; it can be used for mining spatial association rules and querying spatial databases with uncertainty.
13. Methods Based on Evidence Theory . Evidence Theory is Credibility Function (Measure the minimum degree of support of existing evidence for assumptions) andPossible functions The theory that measures the maximum degree of uncertainty information based on existing evidence and cannot deny assumptions. It can be used for spatial data mining with uncertain attributes.
14. Genetic algorithm. This is an algorithm used to simulate the biological evolution process. It can perform efficient parallel global search for the solution space of the problem, and automatically acquire and accumulate knowledge about the search space during the search process, the search process can be controlled through an adaptive mechanism to obtain the optimal solution. Many problems in spatial data mining, such You can use genetic algorithms to obtain classification, clustering, prediction, and other knowledge. . This method has been applied to feature discovery in remote sensing image data.
15. data visualization methods. This is a way to use visualization technology to display spatial data and help people use visual analysis to find spatial knowledge such as structures, features, patterns, trends, anomalies, and relationships in data. To ensure the effectiveness of this method, powerful visual tools and Auxiliary Analysis tools must be built.
16. geometric calculation method . This is a method of using computers Program Calculation of the canvas map of a plane point set, and then the method of discovering spatial knowledge. Using the KNN graph, we can solve the following problems: spatial topology, multi-scale data expression, Automatic Synthesis, spatial clustering, sphere of influence of spatial targets, site selection of public facilities, and determination of shortest paths.
17. Online spatial data mining. This is a network-based validation space for data mining and analysis. It is based on multi-dimensional views and emphasizes execution efficiency and prompt response to user commands. Generally, it uses a spatial data warehouse as a direct data source. This method extracts information and knowledge through the query and analysis tools (such as OLAP, decision analysis, and data mining) of the data analysis and report module to meet the decision-making needs.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.