Python Pandas simple introduction and use of __python

Source: Internet
Author: User
Tags install pandas
The pandas of Python is simply introduced and used

Introduction of Pandas

1. The Python data analysis Library or pandas is a numpy based tool that is created to resolve data profiling tasks. Pandas incorporates a large number of libraries and standard data models that provide the tools needed to efficiently manipulate large datasets. Pandas provides a number of functions and methods that enable us to process data quickly and easily. You will soon find that it is one of the important factors that make Python a powerful and efficient data analysis environment.

2. Pandas is a Python data analysis package, originally developed by AQR Capital Management in April 2008, and open source at the end of 2009, and is currently being developed and maintained by the Pydata development team, which focuses on Python packet development, Part of the Pydata project. Pandas was originally developed as a financial data analysis tool, so pandas provides a good support for time series analysis. The name of the pandas comes from panel data and Python data analysis. Panel data is a term for the cube in economics, and it also provides a pandas for the panel.

3. Data structure: Series: One-dimensional array, similar to one dimension array in NumPy. The two are similar to Python's basic data Structure list, and the difference is that the elements in the list can be different data types, while the array and series only allow the same data type to be stored, which makes it more efficient to use memory and improve efficiency. Time-series: A Series that is indexed by time. Dataframe: Two-dimensional tabular data structure. Many functions are similar to the Data.frame in R. Dataframe can be understood as a series container. The following content is mainly based on Dataframe. Panel: A three-dimensional array that can be understood as a dataframe container. Pandas has two of its own unique basic data structures. Readers should note that while it has two data structures, because it is still a library of Python, some data types in Python are still available here, and you can also use the class to define the data type yourself. It's just that the pandas defines two types of data: Series and Dataframe, which make the data operations simpler. Ii. Pandas installation Because pandas is a third-party library of Python, you need to install it before you use it, and automatically install pandas and related components directly using the PIP install Pandas.

Iii. Use of Pandas

Note: This operation is carried out in Ipython

1, Import pandas module and use alias, and Import series module, the following use is based on this import.

In [1]: From pandas import Series

In [2]: Import pandas as PD 2, Series

Series is like a list, a series of data, each data corresponding to an index value.

Series is the "Up" list:

In [3]: s = Series ([1,4, ' ww ', ' TT '])

In [4]: s
OUT[4]:
0 1
1 4
2 WW
3 TT
Dtype:object

The other thing is like a list, which is the type of element that you decide (in fact, it's up to you).

Here, we essentially create a Series object, which of course has its properties and methods. For example, the following two properties can display the data values and indexes of the Series object in turn:

In [5]: S.index
OUT[5]: Rangeindex (start=0, stop=4, Step=1)

In [8]: S.values
OUT[8]: Array ([1, 4, ' ww ', ' TT '], Dtype=object)

The index of a list can only be an integer starting from 0, and the Series data type is indexed by default. However, unlike the list, Series can customize the index:

In [9]: S2 = Series ([' Wangxing ', ' Mans ', 24],index=[' name ', ' sex ', ' age '])

In [ten]: s2
OUT[10]:
Name Wangxing
Sex Mans
Age 24
Dtype:object

Each element has an index, and you can manipulate the element based on the index. Remember the operation in the list? In Series, there are similar operations. First look at the simple, view its value based on the index and modify its value:

in [[]: s2[' name ']
OUT[12]: ' wangxing '

in [[]: s2[' name '] = ' Wudadiao '

in [+]: s2
OUT[46]:
Name Wudadiao
Sex Mans
Age 24
Dtype:object

This is not a bit like dict data. That's true. Look at the following to understand.

Does the reader notice that the previous definition of the Series object is in the list, which is the parameter of the Series () method, and the first list is its data value, and if you need to define index, put it back, it remains a list. In addition to this method, you can define the Series object in the following ways:

in [+]: sd = {' Python ': 9000, ' C + + ': 9001, ' C # ': 9000}

in [[]: s3 = Series (SD)

in [[]: S3
OUT[15]:
C # 9000
C + + 9001
Python 9000
Dtype:int64

Now understand why the front one is similar to Dict. Because it is possible to define this.

At this point, the index can still be customized. Pandas's advantage is reflected here, if the custom index, the custom index will automatically look for the original index, if the same, take the original index corresponding to the value, this can be referred to as "automatic alignment."

in [: S4 = Series (sd,index=[' Java ', ' C + + ', ' C # '])

in [[]: S4
OUT[17]:
Java NaN
C + + 9001.0
C # 9000.0
Dtype:float64

In pandas, if there is no value, the Zishing is given to NaN.

Pandas has a special method to determine whether a value is empty.

in [[]: Pd.isnull (S4)
OUT[19]:
Java True
C + + False
C # False
Dtype:bool

In addition, the Series object has the same method:

In [m]: S4.isnull ()
OUT[20]:
Java True
C + + False
C # False
Dtype:bool

In fact, the name of the index can be redefined:

in [[]: S4.index = [' Language ', ' mathematics ', ' Chinese ']

In [to]: S4
OUT[22]:
Language NaN
Math 9001.0
中文版 9000.0
Dtype:float64

For Series data, you can also perform operations similar to the following (about operations, which are described in detail later):

in [[]: S4 * 2
OUT[23]:
Language NaN
Math 18002.0
中文版 18000.0
Dtype:float64

in [[]: s4[s4 > 9000]
OUT[24]:
Math 9001.0
Dtype:float64

Series first wrote this, the next look at the pandas of another data structure dataframe. Dataframe

Dataframe is a two-dimensional data structure that is very close to a spreadsheet or similar to a MySQL database. Its vertical called columns, the same as the previous Series, called Index, that is, can be columns and index to determine the location of a main sentence.

First to import the module

in [+]: From pandas import series,dataframe

in [+]: data = {"Name": [' Google ', ' Baidu ', ' Yahoo '], "marks": [100,200,300], "price": [1,2,3]}

in [[]: F1 = dataframe (data)

In [to]: F1
OUT[29]:
Marks name Price
0 Google 1
1 Baidu 2
2 Yahoo 3

This is a common way to define a Dataframe object--using the DICT definition. The dictionary's "Key" ("name", "Marks", "price") is the columns value (name) of the Dataframe, and the value of each "key" in the dictionary is a list of the specific padding data in that vertical column. The index is not defined in the above definition, so the Convention (which has been established in Series) is an integer starting with 0. As is evident from the results above, this is a two-dimensional data structure (similar to the viewing effect in Excel or MySQL).

The above data shows that the order of the columns is not specified, as in the order of the keys in the dictionary, but in Dataframe, columns is a distinct difference from the dictionary key, that is, the order can be specified, to do the following:

In [to]: F2 = dataframe (data,columns=[' name ', ' Price ', ' marks '])

In [F2]:
OUT[32]:
Name Price marks
0 Google 1 100
1 Baidu 2 200
2 Yahoo 3 300

Similar to Series, the index of DATAFRAME data can be customized

In [$]: F3 = Dataframe (data,columns=[' name ', ' Marks ', ' Price '],index=[' a ', ' B ', ' C '])

in [+]: F3
OUT[36]:
Name Marks Price
A Google 100 1
b Baidu 200 2
C Yahoo 300 3

The method of defining dataframe, in addition to the above, you can also use the way of "Dictionary set Dictionary".

in [[]: NewData = {' lang ': {' a ': ' Python ', ' second ': ' Java '}, ' price ': {' A ': 5000, ' Second ': 2000}}

In [$]: F4 = dataframe (NewData)

In [a]: F4
OUT[42]:
Lang Price
The 5000
Second Java 2000

In the dictionary, it is stipulated that the sequence name (first layer key) and each traverse index (the second level Dictionary key) and corresponding data (the second level dictionary value), that is, in the dictionary to set the data in each data grid, not specified are empty.

The Columns property of the Dataframe object that displays the columns name that is known to it. Also, you can get the entire contents of a vertical column (including the index, of course) using the following dictionary-like approach:
>>> NewData = {"Lang": {"firstline": "Python", "Secondline": "Java"}, "price": {"Firstline": 8000}} 
> >> F4 = dataframe (newdata) 
>>> f4 
              lang     price 
firstline     python   8000 
Secondline    Java     
>>> dataframe (NewData, index=["Firstline", "Secondline", "Thirdline"]) 
              lang price 
Firstline     python   8000 
secondline    java     nan 
thirdline     nan      

The Columns property of the Dataframe object that displays the columns name that is known to it. Also, you can get the entire contents of a vertical column (including the index, of course) using the following dictionary-like approach:

in [[]: f3[' name ']
OUT[44]:
A Google
b Baidu
C Yahoo
Name:name, Dtype:object

The following action is to assign a value to the same column

Newdata1 = {' username ': {' a ': ' wangxing ', ' second ': ' Dadiao '}, ' age ': {' A ': ', ' Second ': 25}}

In [to]: F6 = dataframe (newdata1,columns=[' username ', ' age ', ' sex '])

In [F6]:
OUT[68]:
Username Age Sex
Wangxing a NaN
Second Dadiao NaN

in [[to]: f6[' sex '] = ' man '

In [m]: F6
OUT[70]:
Username Age Sex
Wangxing Man
Second Dadiao Mans

can also be a separate assignment, in addition to the unified assignment, but also to "point to point" to add values, combined with the previous Series, since the Dataframe object every vertical column is a Series object, you can first define a Series object, and then put it in the Dataframe Object. As follows:

Ssex = Series ([' Male ', ' Woman '],index=[' ' A ', ' second '])

In [a]: f6[' sex '] = Ssex

In [F6]:
OUT[73]:
Username Age Sex
Wangxing 24 Male
Second Dadiao 25 women

Can you modify the data more accurately? Of course, completely imitate the dictionary operation:

In [f6[]: "Age" [' second '] = 30

In [$]: F6
OUT[75]:
Username Age Sex
Wangxing 24 Male
Second Dadiao 30 women

The reference http://wiki.jikexueyuan.com/project/start-learning-python/312.html is sorted out.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.