Why data preprocessing is needed.
In reality, your data may be incomplete (missing attribute values or some attributes of interest or containing only clustering data), noisy (containing errors or deviations from desired outliers), and inconsistent.
Data cleanup: Fill in missing values, smooth noise data, identify or delete outliers, and resolve inconsistencies
Data integration: When data comes from multiple data sources, and the same attribute is different in different data sources, there is redundancy in the synthesis
Data specification: Simplified descriptive data rollup for datasets
1 Central trends in metric data
Mean, median, number of columns (average value of maximum and minimum)
2 the degree of dispersion of metric data
Four-digit, four-digit extreme difference, variance
Five-digit Overview: Minimum, first four, median, third four, Max.
3 Graphic Display
Data cleaning of histogram, q-q and graph
Data Integration and transformation
Data protocol