Chapter 3 issues
Data mining is not easy because the algorithms used become very complex and data is not always available in one place. It needs to be integrated from a variety of heterogeneous data sources. These factors also cause problems. In this tutorial, we will discuss the main issues:
- Mining Methods and user interaction
- Performance problems
- Various data types
Describes the main issues,
Mining Methods and user interaction problems
It involves the following issues:
Explore different types of knowledge in databases-different users may be interested in different types of knowledge. Therefore, data mining must cover a wide range of knowledge mining tasks.
Interactive knowledge mining in multiple abstract layers-the data mining process must be interactive, because it allows users to search in a centralized manner and provide and optimize data mining requests based on the returned results.
Background Knowledge-background knowledge can be used to guide the discovery process and express the discovery model. Background Knowledge can not only express discovery patterns in simple terms, but also be used in multiple abstract layers.
Data Mining Query Language and specific data mining-Data Mining Query Language allows users to describe specific mining tasks and integrate and optimize with the data warehouse query language for efficient and flexible data mining.
Processing of impurity or incomplete data-When mining data patterns, data cleansing is required to process noise and incomplete objects. If the data cleaning method does not exist, the accuracy of the found mode will be poor.
Pattern evaluation-patterns discovered should be interesting because they represent common sense or lack of new ideas.
Performance problems
Performance problems may exist, such:
Efficiency and scalability of data mining algorithms-to effectively extract information from a large amount of data in a database, data mining algorithms must be efficient and scalable.
Parallel, distributed, and incremental mining algorithms-factors such as the large scale of databases, the wide distribution of data, and the complexity of data mining methods promote the development of parallel and distributed data mining algorithms. These algorithms divide data into multiple partitions that are processed in parallel. Then merge the partition results. Incremental algorithms are used to update databases. You do not need to remine data from the beginning.
Various data types
Processing relational and relational data-databases may contain complex data objects, multimedia data objects, spatial data, and temporal data. A system cannot mine all these types of data.
Mining information from heterogeneous databases and global information systems-data sources on the LAN or WAN are available. These data sources can be structured, semi-structured, or unstructured. Therefore, knowledge mining increases the challenge of data mining.
Data Mining tutorial -- Concise translation part 03