Now give a web statistic that stores the number of visits per hour. Each row contains consecutive hours and information, as well as the number of times the web has been accessed for that hour. The problem to be solved now is to estimate when the traffic reaches the limits of the infrastructure. The limit data is 100,000 visits per hour.
1. Read the data:
# get Data filepath = R ' C:\Users\TD\Desktop\data\Machine learning\1400os_01_codes\data\web_traffic.tsv ' data = Sp.genfromtxt (filepath,delimiter = ' \ t ') x = Data[:,0]y = data[:,1]
where x is the hour and Y represents the amount of traffic.
2. Preprocessing and cleaning data:
Print Sp.sum (Sp.isnan (y))
The results show that there are 8 control values, in order to facilitate the treatment of the missing value method is directly eliminated.
x = X[~sp.isnan (y)]y = Y[~sp.isnan (y)]
Next, draw a scatter plot and observe the law of the data:
# Visualize, observe data Law plt.scatter (x, y) plt.title (' Web traffic over the last month ') Plt.xlabel (' Time ') Plt.ylabel (' hits/hours ') Plt.xticks ([W*24*7 for W in range (5)], [' Week {} '. Format (i)-I in range (5)]) Plt.autoscale (tight = True) Plt.grid () p Lt.show ()
3 Choose the right model and learning algorithm:
To answer the original question, you need to make the following points clear:
1) The real model after finding the noise data
2) Use this model to predict the future and solve our problems once again.
1. First you need to understand the difference between the model and the actual data. The model can be understood as a theoretical approximation to the complex real world simplification. It will always contain some inferior class capacity, which is called approximate error. We calculate this error using the square distance between the real data and the data predicted by the model, and for a well trained model F, calculate the error according to the following function:
def error (F,x,y): Return Sp.sum ((f (x)-y) **2)
The application of the first minimum machine learning