2011, Big Data outbreak, the Big Data era formally arrived, when I just entered the university, then I did not know that will embark on the program ape this road of no return .... 6 years later, the ape began to learn big data, which is hereby recorded today. First, learn the R language.
What is the R language?
The R language is a programming language and software environment for statistical analysis, graphical presentation reporting, and is one of the most popular data analysis and visualization platforms of the year, first appearing in 1993 (two years younger than me), initially by Ross Ihaka and Robert Gentleman in the design and development of the statistical department at Oakland Oakland University in New Zealand, which was popular in 2011 with the outbreak of big data.
At the heart of the R language is an interpreted computer language that allows for modular programming of branching and looping as well as functions. The R language allows for the integration of processes written in C, C + +,. Net, python or FORTRAN to improve efficiency.
Of course, in addition to r, there are other languages of data analysis, such as: Excel,spss,sas.
What are the characteristics of the R language?
As mentioned earlier, the R language is a programming language and software environment for statistical analysis, graphical representation, and reporting. The following is a list of some features of the R language-
- The R language is a well-developed, simple and effective programming language that includes conditions, loops, user-defined recursive functions, and input and output tools.
- The R language has an effective data processing and storage tool,
- The R language provides a set of operators for calculating arrays, lists, vectors, and matrices.
- The R language provides a large, consistent, and integrated collection of data analysis tools.
- The R language provides graphical tools for data analysis and direct display on a computer or in a document.
Why Choose R?
free, support Windows/mac os/linux, open source, there are many powerful toolkits, more large companies to use (Twitter, Ford, New York Times, Microsoft,google); You can complete almost any step of your data analysis design: Data acquisition-data cleansing-data analysis-Results report-and publish results.
In the above 5 steps, the data analysis, the result report, the release result is more important. Start with simple learning:
Data analysis
- Exploratory data analysis
The necessary steps in data analysis, can be plotted to understand the data, R has the ability to draw.
The process of making a formal conclusion based on data, but because of the uncertainty of the conclusion (sample deviation of data acquisition).
For example A, b two people who is more beautiful? In reality, Lori be, each one loves, all also has the uncertainty. In general recognition, as long as the error rate is less than 5%, it is considered to be a formal conclusion.
Use R to complete this critical step.
Linear regression analysis: The linear model is used to fit the data, which can be divided into: Predictor variable, result variable.
For example, analysis of price: Predictor variables can have lots, room size, policies and so on.
Result variables can be derived from predictor variables
Nonlinear regression analysis
- Machine Learning-Classification issues
such as: Cat Pike sofa
This allows the machine to classify the above items. It requires a lot of algorithmic knowledge.
For example: Using the Googlevis API, R make HTML, call Google charts to generate HTML graphics
Use Manipulate,rcharts to make JavaScript interactive graphics from R
Use shiny to create an interactive R program that embeds Web pages. Create and publish R-based results reports through slidify. http://www.shinyapps.io/
Results Report: The result information in the data is summed up by drawing and other. Big Data Analytics Competition Platform
Release Results: The following two platforms can be used to publish a knot Fruit GitHub rpubs
Install R and Rstudio:
Depending on the platform, download and install
Installing r:https://cran.r-project.org/
Installing Rstudio https://www.rstudio.com/
The first knowledge of R language