First, introduce
The main content of data cleaning is to delete irrelevant data, duplicate data, smooth noise data in the original data set, brush off the data unrelated to the mining topic, and deal with the missing value and outliers.
Second, missing value processing
The method of missing value processing is divided into three categories: deleting records, data interpolation and not processing. The common data interpolation method is as follows:
Among them, the two interpolation methods that need to be introduced are: Lagrange interpolation method and Newton interpolation method.
2.1 Lagrange Interpolation method
2.2 Newton interpolation method
3. Examples of Lagrange interpolation method
#拉格朗日插值代码
Import pandas as PD #导入数据分析库Pandas from
scipy.interpolate import lagrange #导入拉格朗日插值函数
inputfile = ' Data/catering_sale.xls ' #销量数据路径
outputfile = ' Sales.xls ' #输出数据路径
In [35]:
data = Pd.read_excel (inputfile) #读入数据
data
OUT[35]:
|
Date |
sales |
0 |
2015-03-01 |
51.0 |
1 |
2015-02-28 |
2618.2 |
2 |
2015-02-27 |
2608.4 |
3 |
2015-02-26 |
2651.9 |
4 |
2015-02-25 |
3442.1 |
5 |
2015-02-24 |
3393.1 |
6 |
2015-02-23 |
3136.6 |
7 |
2015-02-22 |
3744.1 |
8 |
2015-02-21 |
6607.4 |
9 |
2015-02-20 |
4060.3 |
10 |
2015-02-19 |
3614.7 |
11 |
2015-02-18 |
3295.5 |
12 |
2015-02-16 |
2332.1 |
13 |
2015-02-15 |
2699.3 |
14 |
2015-02-14 |
NaN |
15 |
2015-02-13 |
3036.8 |
16 |
2015-02-12 |
865.0 |
17 |
2015-02-11 |
3014.3 |
18 |
2015-02-10 |
2742.8 |
19 |
2015-02-09 |
2173.5 |
20 |
2015-02-08 |
3161.8 |
21st |
2015-02-07 |
3023.8 |
22 |
2015-02-06 |
2998.1 |
23 |
2015-02-05 |
2805.9 |
24 |
2015-02-04 |
2383.4 |
25 |
2015-02-03 |
2620.2 |
26 |
2015-02-02 |
2600.0 |
27 |
2015-02-01 |
2358.6 |
28 |
2015-01-31 |
2682.2 |
29 |
2015-01-30 |
2766.8 |
... |
... |
... |
171 |
2014-08-31 |
3494.7 |
172 |
2014-08-30 |
3691.9 |
173 |
2014-08-29 |
2929.5 |
174 |
2014-08-28 |
2760.6 |
175 |
2014-08-27 |
2593.7 |
176 |
2014-08-26 |
2884.4 |
177 |
2014-08-25 |
2591.3 |
178 |
2014-08-24 |
3022.6 |
179 |
2014-08-23 |
3052.1 |
180 |
2014-08-22 |
2789.2 |
181 |
2014-08-21 |
2909.8 |
182 |
2014-08-20 |
2326.8 |
183 |
2014-08-19 |
2453.1 |
184 |
2014-08-18 |
2351.2 |