Data Mining (i)

Source: Internet
Author: User

I. Issues related to Data
    1. The quality of the data
    2. Data preprocessing to make data more suitable for analysis
    3. Analyze data based on data, find links between data, and use contacts for the rest of the analysis
second, the noun explanation
    • Datasets: Collections of Data Objects
    • Properties: Properties or properties of an object
    • Measure scale: A rule that associates a numeric or symbolic value with an object's properties

Characteristics of the data set

    • Dimension of
    • Sparsity: The proportion of non-0 items is very small, saving only 0 items, which saves time and space.
    • Resolution: Affects the nature of the data

Data cleansing: Clean up unreal or repetitive objects (such as a person's height 2 m, weight 2kg)

Questions related to measurement errors:
Noise, pseudo-image, bias, precision, accuracy
Issues related to data quality:
Outliers, omissions, inconsistent values, duplicate data

Data collection errors: Missing data objects, incorrect inclusion of data objects, or interference with other data that is similar but should not be included

Outliers: objects that are different from most other data in the data set
Missing value: Object missing attribute (e.g. someone is unwilling to reveal name, age)

Aggregation: Merging two or more objects into a single object (table 1: Name of the study number, table 2, the number of scores, after the aggregation becomes a table: School number name score)

Sampling: Selecting a subset of data objects for analysis, sampling is used in data mining to save data processing time and cost.

The principle of effective sampling: the more representative the sample, the closer the effect is to the entire data set

Sampling methods: Simple random sampling (with back-up, no-return), stratified sampling (for the overall composition of different types of objects, and the number varies greatly)

Data Mining (i)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.