Web Analytics: Advanced Data Analysis in Excel (i)

Source: Internet
Author: User
Keywords Excel analysis advanced number

Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall

Web Analytics Professional tools in addition to Google Analytics, Adobe Sitecatalyst, Webtrends, Tencent Analytics and Baidu Statistics, I think the most commonly used data processing tools is Excel, Excel in the most basic is the operation and graphics production , a bit more advanced is the use of functions and PivotTables, and of course you may think of VBA and macros, but it is estimated that few experts will use these advanced features.

That for advanced data analysis, that is, the statistics of the professional analysis methods and principles, it is not necessarily to turn to spss,sas such professional analysis tools? Data analysis from low-level to advanced level of the jump process is there any tool to undertake? This is the Excel data analysis function. Looks like the recent fire of the two Excel books, "Who said rookie will not data analysis" and "Let Excel Fly" did not involve this part of the content. Advanced data analysis involves regression analysis, analysis of variance and T-test methods, do not look at these content seems to have nothing to do with the day-to-day work, in fact, to go higher, MBA courses are included in these content, so early learning to learn later, just know it beforehand, please check the following content.

Before using, you must first install the Excel Data Analysis feature, which, by default, is not installed by Excel, as shown in the following installation:

1 mouse hover on Office Button, then click "Excel Options":

  

2 Find "Add-ins", select "Excel add-in" in the admin section, and click "Go":

  

3 Select "Analysis Tool Library" and click "OK":

  

4 after the installation, you can "data" plate to see the "Data analysis" function, as follows:

  

After the installation, first of all to understand the content of regression analysis.

I. Regression analysis

Before detailed regression analysis, first understand what is called regression? In fact, the return to this phenomenon was first discovered by the British biologist Galton as an interesting phenomenon in the study of the genetic characteristics of parents and children: the hereditary trait of height shows "tall parents whose offspring are taller than average height;" But not necessarily higher than their parents, to a certain degree will be in the direction of the average height of ' return '. This effect is called "regression". The current regression analysis is mostly based on a set of methods and procedures for establishing a quantitative relationship model between variables from the work of Milton. The independent variable is the height of the parent, because the variable is the height of the child.

Baidu Encyclopedia for regression analysis of the definition is: regression analysis (regression analyses) is to determine the two or more of the two variables of the quantitative relationship between the statistical analysis method. Widely used:

1 regression analysis can be divided into one-element regression analysis and multivariate regression analysis according to the number of independent variables involved;

2 According to the relation type between the independent variable and the dependent variable, it can be divided into linear regression analysis and nonlinear regression analysis.

  

Here is an example of the electric business: the conversion rate of e-commerce is certain, the number of Web site access is generally proportional to sales revenue, now to establish a different number of access to the corresponding sales of the standard curve, used to predict the activities of the sales revenue, as follows:

  

1. First, use scatter graphs to depict graphics:

  

  

2. Add trendline, and show regression analysis formulas and R-squared values:

  

  

From the diagram, the R-squared value = 0.9995, the trendline converge to a straight line, the formula is: y=0.01028x-27.424

The R-squared value is a number between 0 and 1, and when the R-squared value of the trendline is 1 or near 1 o'clock, the trendline is the most reliable. Because R2 >0.99, so this is a very obvious numerical value of linear characteristics, that the fitting line can be explained by more than 99.99%, covering the actual data, has a good general, can play a good predictive effect.

3. Use Excel's data analysis function

1 Click "Data Analysis", select "Return" in the pop-up selection box, and click "OK":

  

2 "x Value input Area" Select the number of cells to access, "Y-value input area" to select the cell of sales, and check the options shown below, including residuals, standard residuals, residual plot, linear fit graphs and normal probability graphs.

  

3 The following content is residuals and standard residuals:

  

  

4 The following is the residual plot:

  

The residual graph is a graph of the difference between the actual and the predicted values, if the scatter in the residuals is distributed on both sides of the middle axis, then fitted straight line is reasonable, that the prediction sometimes more, sometimes less, overall is consistent with the trend, but if all on the upper side or the next side is not, so tendentious, need to be processed again.

5) The following are linear fitting graphs

  

In a linear fitting diagram, we can see that, in addition to the actual data points, there are predicted data points that have been prepared and processed, and these parameters are also shown in the table above.

6) The following are normal probability graphs

  

Normal probability graph is used to check whether a group of data obeys normal distribution, is a function relation scatter graph between actual value and normal distribution data, if this group of numbers obeys normal distribution, the normal probability graph will be a straight line. Regression analysis does not necessarily have to conform to the normal distribution, but it is only depicted here.

The above data tables and graphs show that the formula y=0.01028x-27.424 is a trustworthy prediction curve, assuming that there are 500,000 accesses to the flow of activity, the forecast sales will be 51373, as shown in the following figure:

  

If you feel the value of reprint, please specify the article from Shenzhen website analysis. Questions and suggestions can be made at any time, thank you!

Original: http://www.szwebanalytics.com/excel-data-analysis.html

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.