One: Cause
(1) Recently has been dealing with big data, from MB----> GB changes, is a qualitative leap, the corresponding tools are also changing from widows to Linux, from single-machine to Hadoop multi-node computing
(2) The problem is, in the face of huge amounts of data, how to tap into practical information or to find potential phenomena, visual tools may be essential;
(3) Visualization tool can say Baidu a big article, but as the researcher of us, the program ape we may want to be able to abstract a mathematical model, the reality of the phenomenon of very good description and characterization
(4) Python (data cleansing and processing) + MATLAB (model analysis) or c++/java/hadoop ( data cleansing and processing ) + MATLAB ( model analysis )
(5) A previous post can refer to C + + FStream + string processing Big Data
Second: MATLAB Learning
(1) Gamma distribution (Gamfit)
Clcclear allclose alldataall = Load (' G:\zyp_thanks\metro_test\1-07\529_2.csv ');d ata = Dataall (:, 3);% specified column [Y,x]=hist ( data,6);%creates a histogram bar plot of data,sorts data into the number of bins specified by Nbins%return the categorical Levels correponding to each count in Nsubplot (2,2,1) bar (x, y, ' facecolor ', ' r ', ' Edgecolor ', ' w '); box Offcxd1=gamfit (data) ;% returns the maximum Likehood estimates (mles) for the parameters of the gamma distribution given the data in vector data The.% gamma distribution in the parameter α, called the shape parameter, β is called the scale parameter. A = Cxd1 (1), B = Cxd1 (2), CXD2=GAMCDF (Data,cxd1 (1), Cxd1 (2)),%return the gamma cdf (distribution function) at each of the values in x using the Corresponding shape parameters A and scale Parameter%cxd2 = Gampdf (Data,cxd1 (1), Cxd1 (2));%%return the gamma pdf (density function) at Each of the values in x using the corresponding shape parameters A and scale parameterh=kstest (Data,[data,cxd2]); Subplot (2 , 2,2);p lot (DATA,CXD2);
(2) Matalab the name of the. m file
% error Hint:
%attempt to execute SCRIPT * * * as a function in the running of the MATLAB program, there is a title error.
Reason
% in the system, the existing. m file has the same name as the * * function, so the MATLAB compiler does not know which function to execute when encountering * * *.
% For example: I wrote a. m file, named: FFT2.M. Used to achieve texture characteristics of extracting images through frequency domain analysis.
% when the command executes to X=FFT2 (Imagem), it is not known whether FFT2 refers to a system function or a custom texture feature extraction function.
Solution
% change the custom function name to another name. As in the above example, FFT2 changed to FFTTEXTURE.M?
(3) Description of PDF and CDF functions,
Probability density function (PDF) probability density functions;
cumulative distribution function; CDF is cumulative distribution function
(4) Normal distribution (normpdf NORMCDF)
Clcclear allclose alldataall = Load (' G:\zyp_thanks\metro_test\1-07\529_2.csv ');d ata = Dataall (:, 3);% specified column [mu,sigma]= Normfit (data);%estimate of the mean and standard deviation in data[y,x]=hist (data,6);%creates a histogram bar plot of data , sorts data into the number of bins specified by Nbins%return the categorical levels correponding to each count in Nbar (x, Y, ' Facecolor ', ' r ', ' Edgecolor ', ' w '), Box Offxlim ([Mu-3*sigma,mu+3*sigma])% sets the axis limits in the current axes to the Specified valuesa2=axes;% computes the PDF at all of the values in X using the normal distribution% with mean and Standar D deviation Sigma.ezplot (@ (x) normpdf (X,mu,sigma), [Mu-3*sigma,mu+3*sigma]) set (A2, ' box ', ' off ', ' yaxislocation ', ' Right ', ' color ', ' none ') title ' frequency histogram and normal distribution density function (FIT) '
(5) Quantile-quantile plot (q-q plot)
Clcclear allclose alldataall = Load (' G:\zyp_thanks\metro_test\1-07\529_2.csv ');d ata = Dataall (:, 3);% specifies column qqplot (data); % displays a quantile-quantile plot of the sample quantiles of X versus% theoretical from a normal distribution. If the distribution of X is% normal,the plot would be is close to linear.
The way of Big Data processing (MATLAB article)