Pycharm installation and Padans data processing

Last Update:2018-07-24 Source: Internet

Author: User

Tags data structures install pandas

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Installation Configuration Pycharm

Official Download Address: http://www.jetbrains.com/pycharm/
CSDN Download Address: http://download.csdn.net/download/coofly/6637569

Adjustment of color scheme:
File–>settings–>editor–>colors & Fonts Selection Monokai
Show line Numbers
File–>settings–>editor–> general–>appearance–>show line number installation Pandas plugin

Install Pandas first before installing
Setuptools
Pip
Pandas
NumPy
Matplotlib
File–>settings–>pythonproj–>project interpreter Series

Reference: http://pda.readthedocs.org/en/latest/chp5.html
Pandas two important data structures: Series and Dataframe
Series is a one-dimensional, similar array object that contains an array of data (any NumPy data type) and a data label associated with the array, called an index.
eg

Ser = Series ([5,4,3,2,-1])
print (Ser);

Output results:
0 5
1 4
2 3
3 2
4-1
The index is on the left and the value is on the right. Because we did not specify the index for the data, a default index containing integers 0 to N-1 (where N is the length of the data) was created.
Create a series with an index to determine if there is no data point

Ser1 = Series ([5,4,3,2,-1],index=[' A ', ' B ', ' C ', ' d ', ' e '])
print (ser1)

output result:
a    5
b    4
C    3
D    2
e   -1

Retrieving data by index

Print (ser1[' C '])

output result:
3

If you have some data in a Python dictionary, you can create a series from that data by passing the dictionary
Create a series from a dictionary

Sdata = {}
sdata[' a '] = 5
sdata[' c '] = ten
sdata[' B '] = 4
sdata[' d '] =-2
ser2 = Series (sdata) 
  print (ser2)

output:
a     5
b     4
c
D    -2

The index in the result series will be sorted by dictionary

Delete data

#drop
Ser3 = Ser2.drop ([' a '])
print (ser3)

b     4
c
D    -2

Datafame

Datafame represents a table, a spreadsheet-like data structure that contains a sorted set of lists that have many ways to build a dataframe, but the most common one is a dictionary or numpy array with an equal length list:

data = {' state ': [' Ohio ', ' Ohio ', ' Ohio ', ' Nevada ', ' Nevada '],
    ' year ': [Watts, 2001, 2002, 2001, 2002],
    ' pop ': [1.5 , 1.7, 3.6, 2.4, 2.9]}
frame = dataframe (data)
print

output result: Pop state year
0  1.5    Ohio
1  1.7    Ohio  2001
2  3.6    Ohio  2002
3  2.4  Nevada  2001
4  2.9  Nevada  2002

Set Order:

frame = dataframe (data, columns=[' year ', ' state ', ' Pop '])

output result: Year State  pop
0  2000    Ohio  1.5
1  2001    Ohio  1.7
2  2002    Ohio  3.6
3  2001  Nevada  2.4
4  2002  Nevada  2.9

Add Index:

Frame1 = dataframe (data, columns=[' year ', ' state ', ' Pop '],index=[' a ', ' B ', ' C ', ' d ', ' e '))

output result: Year State  pop
a    Ohio  1.5
b  2001    Ohio  1.7
C  2002    Ohio  3.6
d  2001  Nevada  2.4
e  2002  Nevada  2.9

Retrieving data:

Ser3 = frame1[' year ']
print (ser3[' a ']) 

dataframe get the data through columns, get series.
Output Result:
2000

Conditional retrieval

frame1 = frame[frame[' Pop '] > 2]
print (frame1)

output results: Year State  pop
2  2002    Ohio  3.6
3  2001  Nevada  2.4
4  2002  Nevada  2.9

Retrieving data by Index object

Print (frame1.ix[' e '])

output results:
year       2002
State    Nevada
pop         2.9

Add Data:

newval = Series ([ -1.2, -1.5, -1.7], index=[' A ', ' C ', ' e '])
frame1[' debt '] = newval
print (frame1)

output Result:
   Year   State  pop  debt
a    Ohio  1.5  -1.2
b  2001    Ohio  1.7   nan
C  2002    Ohio  3.6  -1.5
D  2001  Nevada  2.4   nan
E  2002  Nevada  2.9  -1.7

If there is no value, it is represented as an NA value in the result

Delete data

FRAM2 = Frame1.drop ([' Year ', ' state '],axis=1);
Print (FRAM2)

output results:
   pop  debt
a  1.5  -1.2
b  1.7   NaN
C  3.6  -1.5
d  2.4   NaN
e  2.9  -1.7
Drop () returns a new object, and the meta object is not changed.

fram2 = Frame1.drop ([' A ', ' B ']);
Print (FRAM2)

output results: Year State  pop  debt
C  2002    Ohio  3.6  -1.5
d  2001  Nevada  2.4   NaN
e  2002  Nevada  2.9  -1.7

Pandas important functionTake the first two row values after three row value transpose to the axis sort the value sort merge append append operation GroupBy Group, sum

Reference: http://www.cnblogs.com/chaosimple/p/4153083.html

Take the first two rows of value print (Frame1.head (2)) output results: Year state pop debt a Ohio 1.5-1.2 B 2001 Ohio 1.7 NaN take three row value print (Frame1.tail (3)) output result: Year state pop debt C 2002 Ohio 3.6-1.5 D 2001 Nevada 2.4 NaN E 2002 Nevada 2. 9-1.7 transpose Print (frame1. T) Output Result: A b C D e year 2001 2002 2001 2002 State Ohio Ohio Ohio Nevada Neva Da Pop 1.5 1.7 3.6 2.4 2.9 debt-1.2 NaN-1.5 NaN-1.7 axis sort print (Frame1.sort_index  cending=true)) output result: Debt pop state year a-1.2 1.5 Ohio-B NaN 1.7 Ohio 2001 c-1.5 3.6 Ohio   2002 D NaN 2.4 Nevada 2001 e-1.7 2.9 Nevada 2002 pair value sort print (Frame1.sort (columns= ' year ')) output result: Year state  Pop debt a Ohio 1.5-1.2 B 2001 Ohio 1.7 nan D 2001 Nevada 2.4 nan C 2002 Ohio 3.6-1.5 E
2002 Nevada 2.9-1.7 Merge: Concat, by merging rows, multiple dataframe and then putting them into an array frame2 = Dataframe (Np.random.rand (4,3)); Dataframe CutFrame3 = [Frame2[:2],frame2[3:4]] dataframe Merge _frame3 = Pd.concat (frame3) print (_FRAME3) output results: 0 1 2 0 0.449950 0.556051 0.811427 1 0.312357 0.429655 0.725275 3 0.558072 0.747375 0.803023 Append Append operations fram
e4 = Frame2.append (_frame3,ignore_index=true); Print (FRAME4) output results: 0 1 2 0 0.449950 0.556051 0.811427 1 0.312357 0.429655 0.725275 2-0. 
287861 0.464538 0.744888 3, 0.558072 0.747375 0.803023 4 0.449950 0.556051 0.811427 5 0.312357 0.429655 0.725275
        6 0.558072 0.747375 0.803023 groupby Group, sum group_data = frame.groupby (' state '). SUM () print (group_data) output result: Year POPs State Nevada 4003 5.3 Ohio 6003 6.8

Import Save Data

Read CSV file
CSV = pd.read_csv (' E:\zhangqx\test.csv ', parse_dates=true)
Write CSV file
Csv.to_csv (' E:\zhangqx\test1.csv ')

Supported formats:
Csv,excel,hdf,sql,json,html

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More