Discover principal component analysis python pandas, include the articles, news, trends, analysis and practical advice about principal component analysis python pandas on alibabacloud.com
methodRanking:Rank ()Axis index with duplicate valuesThe Is_unique () property of the index can tell you if its value is uniqueSummary and calculation of descriptive statisticsSUM ()Mean ()Describe ()Describing and summarizing statistical functionscorrelation coefficients and covarianceThe series and Dataframe methods are computed for the parameter pairs.Unique value, value count, and membershipUnique value: Unique () methodValue count: The Value_counts () method calculates how often each value
The Pandas object has some common mathematical and statistical methods. For example, the sum () method, which makes the column subtotal: the sum () method passed in Axis=1 is specified as a horizontal summary, which is subtotal: Idxmax () gets the index of the maximum value: There is also a rollup that is cumulative, cumsum (), compared to it and Su The difference between M ():The unique () method is used to return only values in the data: the Value_
If you are not a python based classmate, it is recommended to download the installation Anaconda directly, which has integrated a variety of data analysis required modules, here do not repeat.
Download Address: https://www.continuum.io/downloads/
Here's how to install and use Python's pip to install each module method, Pip is a tool for installing and managing Python
pandas:powerful Python Data Analysis Toolkit Official document: http://pandas.pydata.org/pandas-docs/stable/1. Import Package PandasImport Pandas as PD 2. Get the file name under the folderImport osfilenames=[]Path= "C:/users/forrest/pycharmprojects/test" for file in Os.listdir (path): filenames.append (file) 3. R
Recently just learned this piece, if has the wrong place also invites everybody magnanimous.The python package used in this article:Ipython, Numpy, Pandas, matplotlibAncient capital's autumn original reference: Http://www.xiexingcun.com/mingjiaxiejing/302.htm1. Yu Dafu pointed out the date in the inscription at the end of the article.
August 1934, in Peiping
But 1934 data I can not find, had t
the string object method Split () method splits the string:The Strip () method removes whitespace and line breaks:Split () in combination with strip () using:The "+" symbol allows you to concatenate multiple strings together:The join () method is also the connection string, comparing it to the "+" symbol:The In keyword determines whether a string is contained in another string:The index () method and the Find () method determine the location of a substring: the difference between the index ()
SummaryThe use of Python for data analysis, you need to install some common tools, such as numpy,pandas,scipy, etc., during the installation process, often encountered some installation details problems, such as version mismatch, need to rely on the package is not installed properly, etc. This article summarizes the next few necessary installation package install
Hierarchical Indexes Hierarchical indexing means you can have multiple indexes on an array, for example: a bit like a merged cell in Excel, right?Select a subset of the data based on the index to select a subset of the data from the other layer:Select data in the same way as the index in the layer:Multi-index series conversion to Dataframe hierarchical indexes play an important role in data reshaping and grouping, for example, the hierarchical index data above can be converted to a dataframe:For
Using Python for data analysis (13) pandas basics: Data remodeling/axial rotation, pythonpandas Remodeling DefinitionRemodeling refers to re-arranging data, also called axial rotation.DataFrame provides two methods:
Stack: rotate the column of data into rows.
Unstack: "Rotate" data rows as columns.
For example:
Process stack formatThe stack format is also
ordered data such as time series, it may be necessary to do some interpolation when re-indexing, the method option can achieve this purpose:For ordered data such as time series, it may be necessary to do some interpolation when re-indexing, the method option can achieve this purpose:
Method Parameter Introduction
Parameters
Description
Ffill or pad
Forward padding
Bfill or Backfill
Back to fill
Problem Description: Run the following program to generate the hotel turnover simulation data file in the current folder Data.csvThen complete the following tasks:1) Use Pandas to read the data in the file Data.csv, create the Dataframe object, and delete all of the missing values;2) Use Matplotlib to generate line chart, reflect the daily turnover of the hotel, and save the graphic as a local file first.jpg;3) Statistics by month, using Matplotlib to
Objective
Pandas is a numpy built with more advanced data structures and tools than the NumPy core is the Ndarray,pandas is also centered around Series and dataframe two core data structures. Series and Dataframe correspond to one-dimensional sequence and two-dimensional table structure respectively. Pandas's conventional approach to importing is as follows:
From
Getting started with Python for data analysis--pandas
Based on the NumPy established
from pandas importSeries,DataFrame,import pandas as pd
One or two kinds of data structure 1. Series
A python
data conversion refers to filtering, cleaning, and other conversion operations on the data. Remove Duplicate data Repeating rows often appear in the Dataframe, Dataframe provides a duplicated () method to detect whether rows are duplicated, and another drop_duplicates () method to discard duplicate rows:Duplicated () and Drop_duplicates () methods defaultJudging all Columns, if you do not want to, the collection of incoming columns as a parameter can be specified as a column, for example:Dupl
Ming 6.0 - Name:price, Dtype:float64 -Zhang San 1.2 theReese 1.0 -Harry 2.3 -Chen Jiu 5.0 -Xiao Ming 6.0 +Name:price, Dtype:float64 In general, we often need to value by column, then Dataframe provides loc and Iloc for everyone to choose from, but the difference is between the two.1 Print(frame2)2 Print(frame2.loc['Harry'])#Loc can use the index of the string type, whereas the Iloc can only be of type int3 Print(frame0.iloc[2])4 out[2]: 5 Color Object Price6Zhang San Blue ball 1.27Reese Green
1.1. Pandas Analysis steps
Loading data
COUNT the date of the access_time. SQL similar to the following:
SELECT date_format (access_time, '%H '), COUNT (*) from log GROUP by Date_format (access_time, '%H ');
1.2. Code
Cat pd_ng_log_stat.py#!/usr/bin/env python#-*-Coding:utf-8-*-From Ng_line_parser import NglineparserImport
values appearDf.boxplot (column= ' label 1 ', by = ' Label 2 ')Plt.show ()The data under label 1 can then be plotted in a numerical distribution according to label 2As indicated below, it has been classified according to the level of education, high-level wage extremes, and other conclusions can be obtainedNote: When you want to paint, the individual input drawing instructions can not display graphics, then you need to enter Plt.show () on another line, condition: import Matplotlib.pyplot as Pl
Operating system: Windowspython:3.5Welcome to join the Learning Exchange QQ Group: 657341423
The previous section describes the library of data analysis and mining needs, the most important of which is pandas,matplotlib.Pandas: Mainly on data analysis, calculation and statistics, such as the average, square bad.Matplotlib: The main combination of
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.