First, data cleaning
1. Duplication of data processing
(1) COUNTIF function constructs auxiliary columns
The C column is screened, the non 1 is selected, and then the deletion can leave the duplicates.
Or you can sort the C column in descending order, and then delete the first few rows that are not 1
(2) Excel Advanced Filter
(3) Conditional formatting: Highlighting duplicate values
(4) Pivot table: Count
(5) Excel deletes duplicates
2. Fill in missing data
(1) If the missing value is too high, there is a problem, acceptable missing value below 10%
(2) Four ways to handle trend values:
A. Using the average of sample statistics to replace
B. Value substitution calculated by statistical model
C. Delete records with missing values
D. There will be deficiencies worthy of record retention
(3) There are two main missing values: null value and Error identifier
A. Null value: CTRL+G, Position null value, directly enter the value to replace, press Ctrl+enter, all the empty values of the part is filled with the value just entered
B. Error identifiers: Excel find Replacements
Ps.shift continuously selected, CTRL does not continuously select
3. Detecting logical errors in data
According to the specific circumstances
The background is an option for a questionnaire that is titled Multiple choices and can only choose 3, 0 is not selected, 1 is selected
The figure uses two methods
The first method is to count the total number of options that are selected to be marked 1 in each row, or to mark an error if more than 3.
The second method is to use conditional format to select illegal numbers from 0 and 1. Using the OR function, =or (b3=1,b3=0) =false that if the proposition "B3 1 or 0" is wrong (=false), that is, B3 is neither 0 nor 1, then it is highlighted
Second, data processing
1. Data extraction
(1) Field breakdown
A.excel breakdown function
B.left function Right function
(2) fields combine:& and CONCATENATE functions
(3) field matching
VLOOKUP: to match a column
VLOOKUP (parameter 1, parameter 2, parameter 3, parameter 4)
Parameter 1: the area being searched
Parameter 2: the search area. Note that the search area must be in the first column of the search area, which is the column with parameter 1, and be sure to be in the first column of the parameter 2 search area.
Parameter 3: The returned serial number starts at 1
Parameter 4:0/false table exact match, 1/true table fuzzy match, default value is fuzzy match
2. Data calculation
(1) Simple calculation: +-*/
(2) Function calculation
A.sum and Average
B. Date-related
3. Data grouping
The threshold value is to find the nearest A2 in column D and not to A2, and the threshold is lower
4. Data Conversion
(1) Row and column conversions of data tables: Transpose
(2) Conversion between multiple selection and entry mode
HLOOKUP: Matching by row
Isnumber: Whether it is a numeric value
Search: Searching for functions
Iii. Sampling of data
Rand function: Returns the decimal number between [0,1]