Python is a simple tutorial for data analysis, and python uses data analysis
Recently, Analysis with Programming has joined Planet Python. As the first special blog of this website, I will share with you how to start data analysis using Python. The details are as follows:
Data ImportImport local or web-side CSV files;Data transformation;Data Statistics description;Hypothesis TestOne sample T-test;Visualization;Create a UDF.
Data Import
This is a key step. We need to import data for subsequent a
This article mainly introduces a simple tutorial on using Python for data analysis. it mainly introduces how to use Python for basic data analysis, such as data import, change, Statistics, and hypothesis testing, for more information, see the recent introduction of Analysis with Programming to Planet Python. As the first special blog of this website, I will share with you how to start data analysis using Python. The details are as follows:
Data importImport local or web-side CSV files;Data tran
'])
Pd. Series ({' A ': 1, ' B ': 2})
Pd. Series (0, index=[' a ', ' B ', ' C ', ' d '])
Get an array of values and an array of indexes:
Values Property
Index Property
Pandas:series characteristics
Series supports the characteristics of the NumPy module (subscript):
Create Series:series from Ndarray (arr)
With scalar operations: sr*2
Two series operations: SR1+SR2
Index: sr[0], sr[[1,2,4]]
Slices: sr[0
More recently, analysis with programming joined Planet Python. As the first special blog of the site, I'll share how to start data analysis with Python. The specific contents are as follows:
Data importImport a local or web-side CSV file;Data transformation;Data statistical description;Hypothesis TestingSingle sample t test;visualization;Create a custom function.
Data import
This is a critical step, and for subsequent analysis we need to import the data first. In general, the data is in CSV fo
columns]
selection¶
Note
While standard python/numpy expressions for selecting and setting are intuitive and come with handy for interactive, For production Code, we recommend the optimized pandas data access methods,. At,. IAT,. Loc,. Iloc and. IX.
The indexing section and below. getting¶
Selecting a single column, which yields a Series, equivalent to DF. A
in [[]:
Function Prototypes:Https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.fillna.html#pandas.DataFrame.fillnaPad/ffill: Fills the missing value with the previous non-missing valueBackfill/bfill: Fills the missing value with the next non-missing valueNone: Specify a value to replace the missing value
123456789101112131415161718192021st22232425262728293031323334353637383940414243444546474849505152535455565758596061 62 63 64 65 66 67 68 69 70 71 72 73 74
American Group Shop Evaluation Language Processing and classification (NLP)
The First Data Analysis section
The second visualization section,
This article is the third of the series, text classification
The main use of the package has Jieba,sklearn,pandas, this post mainly uses the word bag model (bag of words), the text in the form of a numerical feature vector (each document constructs a eigenvector, there are a lot of 0, the value appearing in the eigenvector is also called the
there is only one data type for this data structure, and in Python's data frame, it is possible to store multiple data types, basically without any restrictions on the default data type.Question fourth: What is the way to access and access the data in this structure?
Access location
Method
Note
Access columns
Variable name [column name]
Access the corresponding column
Access rows
Variable name [n:m]
Access n rows to m-1 row
This article brings the content is about Python pandas in-depth understanding (code example), there is a certain reference value, the need for friends can refer to, I hope to help you.
First, screening
First, create a 6X4 matrix data.
Dates = Pd.date_range (' 20180830 ', periods=6) df = PD. DataFrame (Np.arange) reshape ((6,4)), index=dates, columns=[' A ', ' B ', ' C ', ' D ']) print (DF)
Print:
section "Getting Started with data structures (Intro to data Structures)". Open this page next to your Jupyter notebook. When you read the document, write down (rather than copy) the code and execute it in the notebook. As you execute your code, explore these operations and try to explore new ways to use them.Then select the section "Index and select data (indexing, Selecting data)". Create a new Jupyter notebook, write and execute the code, and then explore the different actions you learned. T
the original value, which is different from ndarry, for example, the drop line after the call to the original object, found that there is no change Drop column: Obj4.drop (' Nevada ', Axis=1)In the parameters of many functions of Python, the default is to consider row, so there is axis (axis) This parameter Axis=1 is vertical, that is, the columnAxis=0 is a horizontal, 4.2 Select selection, slice slicing, index A: Select a separate column, which will return a Series,
Import NumPy as NP from
Pandas import dataframe
import pandas as PD
Df=dataframe (Np.arange () reshape (3,4 ), index=[' One ', ' two ', ' THR '],columns=list (' ABCD ')
df[' A ' #取a列
df[[' A ', ' B ']] #取a, column B
#ix可以用数字索引, You can also use index and column indexes
df.ix[0] #取第0行
df.ix[0:1] #取第0行
df.ix[' one ': ' Two '] #取one, two row
df.ix[0:2,0] #取第0 ,
. Display indexes, columns, and underlying numpy data:3. The describe () function is a quick statistical summary of the data:4. Transpose the data:5, by axis to sort6. Sort by valueThird, the choiceWhile the standard python/numpy selection and setup expressions can come in handy, we recommend using optimized pandas data access as the code used for the project:. At,. IAT,. Loc,. Iloc and. IX For details see indexing and selecing Data and multiindex/adv
-04-14 4 52013-04-15 1 2 182013-04-17 9 12013-04-18 7 17
Update: If there is no special requirement, it is highly recommended to use LOC with minimal use [], as Loc avoids chained indexing problems when Dataframe is re-assigned, using [] The compiler is likely to give settingwithcopy warnings.
See the official documentation for details: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
Iloc
If Loc is
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.