Data Mining tutorial -- Concise translation part 03

Source: Internet
Author: User
Chapter 3 issues

 

Data mining is not easy because the algorithms used become very complex and data is not always available in one place. It needs to be integrated from a variety of heterogeneous data sources. These factors also cause problems. In this tutorial, we will discuss the main issues:

  • Mining Methods and user interaction
  • Performance problems
  • Various data types

Describes the main issues,

 

 

Mining Methods and user interaction problems

 

It involves the following issues:

Explore different types of knowledge in databases-different users may be interested in different types of knowledge. Therefore, data mining must cover a wide range of knowledge mining tasks.

Interactive knowledge mining in multiple abstract layers-the data mining process must be interactive, because it allows users to search in a centralized manner and provide and optimize data mining requests based on the returned results.

Background Knowledge-background knowledge can be used to guide the discovery process and express the discovery model. Background Knowledge can not only express discovery patterns in simple terms, but also be used in multiple abstract layers.

Data Mining Query Language and specific data mining-Data Mining Query Language allows users to describe specific mining tasks and integrate and optimize with the data warehouse query language for efficient and flexible data mining.

Processing of impurity or incomplete data-When mining data patterns, data cleansing is required to process noise and incomplete objects. If the data cleaning method does not exist, the accuracy of the found mode will be poor.

Pattern evaluation-patterns discovered should be interesting because they represent common sense or lack of new ideas.

 

 

Performance problems

 

Performance problems may exist, such:

Efficiency and scalability of data mining algorithms-to effectively extract information from a large amount of data in a database, data mining algorithms must be efficient and scalable.

Parallel, distributed, and incremental mining algorithms-factors such as the large scale of databases, the wide distribution of data, and the complexity of data mining methods promote the development of parallel and distributed data mining algorithms. These algorithms divide data into multiple partitions that are processed in parallel. Then merge the partition results. Incremental algorithms are used to update databases. You do not need to remine data from the beginning.

 

 

Various data types

 

Processing relational and relational data-databases may contain complex data objects, multimedia data objects, spatial data, and temporal data. A system cannot mine all these types of data.

Mining information from heterogeneous databases and global information systems-data sources on the LAN or WAN are available. These data sources can be structured, semi-structured, or unstructured. Therefore, knowledge mining increases the challenge of data mining.

Data Mining tutorial -- Concise translation part 03

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.