Analysis of Big Data (i) exploratory analysis

Source: Internet
Author: User
Tags ggplot

Recently, Big Data rage, also become our code Nong Hot technology. With the Hadoop Environment, we look at a variety of Hadoop Technology Books, browse Hadoop,hive , Storm and other technologies. Over a period of time, when we want to use these techniques to practice data. In the face of the test data from the Internet, there is no way, or no matter 3,721, a statistical regression model.

We are completely clueless about big data and big data analytics, and we're even confused about big data technology, and we've got to shrink from it.

What to do when we get the data, and if we don't know how to do it, we'll start with exploratory analysis.

Analysis data can be divided into two stages of exploration and validation. Exploratory data analysis (exploratory, hereinafter referredto as EDA) refers to data that is already in place ( especially the original data that is investigated or observed ) Explore under the least priori assumptions. Exploratory data analysis is particularly effective when we do not have enough experience with the information in this data and do not know what traditional statistical methods are used for analysis.

Exploratory analysis is generally represented by histograms and stem-leaf plots. The basic tools for exploratory data analysis are graphs, tabulation, and summary statistics. In general, exploratory data analysis is a systematic analysis of the data, it shows the distribution of all variables, time series data and transformation variables, using the hash matrix diagram to show the relationship between the variables 22, and get all the aggregated statistics. In other words, you want to calculate the mean, maximum, minimum, upper and lower four-bit, and determine outliers.

Say so much, let's take an example. and the implementation of R language and SPSS are given .

The attached data contains 5 columns: Age, gender, number of ads, clicks, and whether to sign in.

implementation of the R language:

1root= "f:/dds_datasets/dds_ch2_nyt/"2 SETWD (Root)3File<-paste (Root, "nyt1.csv", sep= "")4nytdata<-read.csv (file)5 Head (nytdata)6Nytdata$agecat<-cut (Nytdata$age,c (-inf,0,18,24,34,44,54,64, INF))7 Summary (nytdata)8 9Install.packages ("Doby")TenLibrary ("Doby") Onesiterange<-function (x) {C (length (x), Min (x), mean (x), Max (x))} ASummaryby (age~agecat,data=nytdata,fun=Siterange) -Summaryby (Gender+signed_in+impressions+clicks~agecat,data =nytdata) - # #先画出直方图图 the  -Install.packages ("Ggplot2") -Library ("Ggplot2") -  +Ggplot (Nytdata,aes (x=impressions,fill=agecat)) +Geom_histogram () -#ggplot (Nytdata,aes (X=impressions,y=agecat,fill=agecat)) +geom_area ()
View Code

The analysis results are as follows:

The implementation of SPSS is relatively simple, through the wizard to import data, choose Analysis-Data Description-Explore on the line.

I am also a member of the yard, big data for me I am also a beginner, some time ago began to learn R Language, interested colleagues can come in to communicate with each other.

I do not know where to send attachments, dizzy ... Please contact me if you need any data.

Analysis of Big Data (i) exploratory analysis

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.