The way of Big Data processing (MATLAB article)

Source: Internet
Author: User
Tags sorts

One: Cause

(1) Recently has been dealing with big data, from MB----> GB changes, is a qualitative leap, the corresponding tools are also changing from widows to Linux, from single-machine to Hadoop multi-node computing

(2) The problem is, in the face of huge amounts of data, how to tap into practical information or to find potential phenomena, visual tools may be essential;

(3) Visualization tool can say Baidu a big article, but as the researcher of us, the program ape we may want to be able to abstract a mathematical model, the reality of the phenomenon of very good description and characterization

(4) Python (data cleansing and processing) + MATLAB (model analysis) or c++/java/hadoop ( data cleansing and processing ) + MATLAB ( model analysis )

(5) A previous post can refer to C + + FStream + string processing Big Data

Second: MATLAB Learning

(1) Gamma distribution (Gamfit)

Clcclear allclose alldataall = Load (' G:\zyp_thanks\metro_test\1-07\529_2.csv ');d ata = Dataall (:, 3);% specified column [Y,x]=hist (  data,6);%creates a histogram bar plot of data,sorts data into the number of bins specified by Nbins%return the categorical Levels correponding to each count in Nsubplot (2,2,1) bar (x, y, ' facecolor ', ' r ', ' Edgecolor ', ' w '); box Offcxd1=gamfit (data) ;% returns the maximum Likehood estimates (mles) for the parameters of the gamma distribution given the data in vector data The.% gamma distribution in the parameter α, called the shape parameter, β is called the scale parameter.  A = Cxd1 (1), B = Cxd1 (2), CXD2=GAMCDF (Data,cxd1 (1), Cxd1 (2)),%return the gamma cdf (distribution function) at each of the values in x using the Corresponding shape parameters A and scale Parameter%cxd2 = Gampdf (Data,cxd1 (1), Cxd1 (2));%%return the gamma pdf (density function) at Each of the values in x using the corresponding shape parameters A and scale parameterh=kstest (Data,[data,cxd2]); Subplot (2 , 2,2);p lot (DATA,CXD2);

(2) Matalab the name of the. m file

% error Hint:

%attempt to execute SCRIPT * * * as a function in the running of the MATLAB program, there is a title error.

Reason
% in the system, the existing. m file has the same name as the * * function, so the MATLAB compiler does not know which function to execute when encountering * * *.
% For example: I wrote a. m file, named: FFT2.M. Used to achieve texture characteristics of extracting images through frequency domain analysis.
% when the command executes to X=FFT2 (Imagem), it is not known whether FFT2 refers to a system function or a custom texture feature extraction function.

Solution
% change the custom function name to another name. As in the above example, FFT2 changed to FFTTEXTURE.M?

(3) Description of PDF and CDF functions,

Probability density function (PDF) probability density functions;

cumulative distribution function; CDF is cumulative distribution function

(4) Normal distribution (normpdf NORMCDF)

Clcclear allclose alldataall = Load (' G:\zyp_thanks\metro_test\1-07\529_2.csv ');d ata = Dataall (:, 3);% specified column [mu,sigma]= Normfit (data);%estimate of the mean and standard deviation in data[y,x]=hist (data,6);%creates a histogram bar plot of data , sorts data into the number of bins specified by Nbins%return the categorical levels correponding to each count in Nbar (x, Y, ' Facecolor ', ' r ', ' Edgecolor ', ' w '), Box Offxlim ([Mu-3*sigma,mu+3*sigma])% sets the axis limits in the current axes to the Specified valuesa2=axes;% computes the PDF at all of the values in X using the normal distribution% with mean and Standar D deviation Sigma.ezplot (@ (x) normpdf (X,mu,sigma), [Mu-3*sigma,mu+3*sigma]) set (A2, ' box ', ' off ', ' yaxislocation ', ' Right ', ' color ', ' none ') title ' frequency histogram and normal distribution density function (FIT) '

(5) Quantile-quantile plot (q-q plot)

Clcclear allclose alldataall = Load (' G:\zyp_thanks\metro_test\1-07\529_2.csv ');d ata = Dataall (:, 3);% specifies column qqplot (data); % displays a quantile-quantile plot of the sample quantiles of X versus% theoretical from a normal distribution. If the distribution of X is% normal,the plot would be is close to linear.


The way of Big Data processing (MATLAB article)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.