Directory
Objective
Chapter 1th Introduction 1
1.1 The power of the data 1
1.2 What is Data science 1
1.3 Excitation hypothesis: DataSciencester2
1.3.1 Looking for key contacts 3
1.3.2 You might know data scientist 5
1.3.3 Salary and working life 8
1.3.4 paid Account 10
1.3.5 Interest Topic 11
1.4 Outlook 12
2nd Python crash 13
2.1 Basic Content 13
2.1.1Python Get 13
Zen of 2.1.2Python 14
2.1.3 Blank Form 14
2.1.4 Module 15
2.1.5 Algorithm 16
2.1.6 function 16
2.1.7 String 17
2.1.8 Exception 18
2.1.9 List 18
2.1.10-Tuple 19
2.1.11 Dictionary 20
2.1.12 Set of 22
2.1.13 Control Flow 23
2.1.14 Real and fake 24
2.2 Advanced Content 25
2.2.1 Sort 25
2.2.2 List Resolution 25
2.2.3 Generators and iterators 26
2.2.4 Randomness 27
2.2.5 Regular Expression 28
2.2.6 Object-Oriented programming 28
2.2.7 Functional Tool 29
2.2.8 Enumeration 31
2.2.9 compression and parametric split 31
2.2.10args and Kwargs32
2.2.11 Welcome to DataSciencester33
2.3 Extension Study 33
3rd Visualize Data 34
3.1matplotlib34
3.2 Bar Chart 36
3.3 Line Figure 40
3.4 Scatter Figure 41
3.5 Extension Study 44
4th Chapter Linear Algebra 45
4.1 Vector 45
4.2 Matrix 49
4.3 Extension Study 51
Chapter 5th Statistics 53
5.1 Describing a single data set 53
5.1.1 Center Inclination 55
5.1.2 Dispersion degree 56
5.2 Related 58
5.3 Simpson Paradox 60
5.4 Related factors Other considerations 61
5.5 Correlation and Causality 62
5.6 Extension Study 63
Chapter 6th probability 64
6.1 Not independent and independent 64
6.2-piece Probability 65
6.3 Bayesian theorem 66
6.4 Random Variables 68
6.5 Continuous Distribution 68
6.6 Normal Distribution 69
6.7 Central limit theorem 72
6.8 Extension Study 74
Chapter 7th hypothesis and Inference 75
7.1 Statistical hypothesis Test 75
7.2 Case: coin toss 75
7.3 Confidence Interval 79
7.4p-hacking80
7.5 Case: Running A/b test 81
7.6 Bayesian Inference 82
7.7 Extension Study 85
8th Gradient Drop 86
8.1 Gradient Descent thought 86
8.2 Estimating Gradients 87
8.3 Using gradients 90
8.4 Choosing the correct step 90
8.5 Synthesis 91
8.6 Random Gradient Descent method 92
8.7 Extension Study 93
9th. Data Acquisition 94
9.1stdin and Stdout94
9.2 reading a file 96
9.2.1 Text File Base 96
9.2.2 Restricted files 97
9.3 Network Crawl 99
9.3.1HTML and parsing method 99
9.3.2 case: O ' Reilly book on Data 101
9.4 Using API105
9.4.1JSON (and XML) 105
9.4.2 using a API106 without authentication
9.4.3 Looking for API107
9.5 Case: Using TwitterAPI108
9.6 Extension Study 111
Chapter 10th Data Work 112
10.1 Explore your Data 112
10.1.1 exploring one-dimensional data 112
10.1.22-D Data 114
10.1.3 multi-dimensional data 116
10.2 Cleanup and modification 117
10.3 Data Processing 119
10.4 Data Adjustment 122
10.5 dimensionality Reduction 123
10.6 Extension Study 129
Chapter 11th Machine Learning 130
11.1 Modeling 130
11.2 What is machine learning 131
11.3 Over fitting and under fitting 131
11.4 Correctness 134
11.5 Bias-variance Tradeoff 136
11.6 feature extraction and selection 137
11.7 Extension Study 138
12th Chapter K Nearest Neighbor Method 139
12.1 Model 139
12.2 Case: Preferred programming language 141
12.3 Dimension Disaster 146
12.4 Extension Study 151
13th naive Bayesian algorithm 152
13.1 A simple junk Mail Filter 152
13.2 A complex junk e-mail Filter 153
13.3 implementation of the algorithm 154
13.4 Test Model 156
13.5 Extension Study 158
The 14th Chapter Simple Linear regression 159
14.1 Model 159
14.2 using gradient descent method 162
14.3 Maximum likelihood estimate 162
14.4 Extension Study 163
15th Chapter Multiple Regression analysis 164
15.1 Model 164
15.2 further assumptions of the least squares model 165
15.3 Fitting a Model 166
15.4 Interpreting the Model 167
15.5 Goodness of Fit 167
15.6 Off-topic: Bootstrap168
15.7 standard error of the regression coefficient 169
15.8 regularization 170
15.9 Extension Study 172
Chapter 16th Logistic Regression 173
16.1 Question 173
16.2Logistic function 176
16.3 Application Model 178
16.4 Goodness of Fit 179
16.5 Support Vector Machine 180
16.6 Extension Study 184
Chapter 17th Decision Tree 185
17.1 What is a decision tree 185
17.2 Entropy 187
17.3 Entropy of Division 189
17.4 Creating a decision tree 190
17.5 Comprehensive use 192
17.6 Random Forest 194
17.7 Extension Study 195
18th. Neural network 196
18.1 Perceptron 196
18.2 Feedforward Neural Network 198
18.3 Reverse Propagation 201
18.4 Example: defeating CAPTCHA202
18.5 Extension Study 206
19th Chapter Cluster Analysis 208
19.1 Principle 208
19.2 Model 209
19.3 Example: Party 210
19.4 Select the number of clusters k213
19.5 Example: Clustering colors 214
19.6 bottom-up layered clustering 216
19.7 Extension Study 221
Chapter 20th Natural Language Processing 222
20.1 Word Cloud 222
20.2n-grams Model 224
20.3 Syntax 227
20.4 off-topic: Gibbs Sampling 229
20.5 Theme Modeling 231
20.6 Extension Study 236
The 21st chapter of network Analysis 237
21.1 Mediation Center Degree 237
21.2 characteristic Vector Center degree 242
21.2.1 Matrix multiplication 242
21.2.2 Center Degree 244
21.3 direction diagram and PageRank246
21.4 Extension Study 248
22nd Recommendation System 249
22.1 Manual Screening 250
22.2 Recommended popular Things 250
22.3 User-based collaborative filtering method 251
22.4 Object-based collaborative filtering algorithm 254
22.5 Extension Study 256
23rd Chapter Database and SQL257
23.1CREATETABLE and INSERT257
23.2update259
23.3delete260
23.4select260
23.5groupby262
23.6orderby264
23.7join264
23.8 Sub-Query 267
23.9 Index 267
23.10 Query Optimization 268
23.11nosql268
23.12 Extension Study 269
24th Chapter MAPREDUCE270
24.1 Case: Word Count 270
24.2 Why is MapReduce272
24.3 More Generalized MapReduce272
24.4 Case: Analysis status update 273
24.5 Case: Matrix calculation 275
24.6 off-topic: Combo 276
24.7 Extension Study 277
25th. Data Science Outlook 278
25.1ipython278
25.2 Mathematics 279
25.3 not starting from zero 279
25.3.1numpy279
25.3.2pandas280
25.3.3scikit-learn280
25.3.4 Visualization 280
25.3.5r281
25.4 Finding Data 281
25.5 in Data Science 281
25.5.1hackernews282
25.5.2 Fire Truck 282
25.5.3T Shirt 282
25.5.4, what about you? 283
Getting Started with Data science