. dataFrame ([[1, 2, 3, 4, 16], ['1', '2', '3', '4', 'F'], index = ['data1', 'data2 ']) print (df) # times over 10 times. Check the difference between the result and the expected result. df. apply (lambda x: x * 10) # view the data type df. dtypes # df. loc ['data2 '] = pd. to_numeric (df. loc ['data2 ']) # converts only the data that can be converted. The value
The following for you to share a Python data Analysis Library Pandas basic operation method, has a good reference value, I hope to help you. Come and see it together.
What is Pandas?
Is it it?
。。。。 Apparently pandas is not so cute as this guy ....
Let's take a look at how Pandas's official website defines itself:
Pandas is a open source, easy-to-use data structures and data analysis tools for the Python programming language.
Obviously, pandas is a very powerful data analysis library for Pyth
"Original" 10 minutes to fix pandasThis article is a simple translation of "Ten Minutes to Pandas" on the official website of Pandas, the original is here. This article is a simple introduction to pandas, detailed introduction please refer to:Cookbook . As a rule, we will introduce the required packages in the following format:First, create the objectYou can view detailed information about the contents of this section through the Data Structure Intro setion.1, you can create a series,pandas by p
famous data Analysis library in Python panda
The Pandas Library is a numpy-based tool that is created to solve data analysis tasks and is also built around the two core data structures of series and DataFrame, where series and DataFrame correspond to one-dimensional sequences and two-dimensional table structures.
Pandas provides a number of functions and methods that enable us to process data
former tells Matplotlib where to place the # scale in the data range, which is the scale label by default. However, you can use Set_xticklabels to add any other value as a label ticks = Ax.set_xticks ([0,250,500,700,900,1000]) #下面的totation是规定旋转角度labels = Ax.set _xticklabels ([' A ', ' B ', ' C ', ' d ', ' e ', ' f '],rotation = 30,fontsize = ' small ') #可以为x轴设置名称ax. Set_xlabel (' Stages ') plt.show ( )Legend#-*-Encoding:utf-8-*-import numpy as Npimport pandas as Pdimport Matplotlib.pyplot as Pl
missing data, here because the whole sample size is large, so I directly delete the missing data. In addition, since the original data is not all separated by commas, you need to separate the columns with the following code:# 删除缺失数据feature_set2=feature_set[feature_set[1]!=-1] # 只获取不是-1的DataFrame即可。# print(feature_set2) # 没有问题feature_set2=feature_set2.reset_index(drop=True)print(feature_set2.head())# 第0列既包含日期,又包含时间,故要拆分成两列need_split_col=feature_set2[0
3
6
H
7
3
7
I
8
3
8
J
9
3
9
By using *loc, we can select some of the data in the Dataframe.
Df.loc[' a ']
Rev. 0
Test 3
col 0
name:a, Dtype:int64
# df.loc[starting index (included): Terminating index (inclusive)]
df.loc[' a ': ' d ']
Rev
Test
Col
PandasPandas is a popular open source Python project that takes the name of panel data and Python data analysis.Pandas has two important data structures: Dataframe and seriesThe dataframe of PANDAS data structurePandas's DATAFRAME data structure is a tagged two-dimensional object that is very similar to Excel spreadsheets or relational data tables.You can create
memory at once, the linear SVC,SGD can be carried out separately. machine learning problem Solving ideas What do I know when I get the data (visualization)? choosing the most appropriate machine learning algorithm Locating the Model state (over-fitting or under-fitting) and the workaround feature analysis and visualization of a large number of levels of data the advantages and disadvantages of various loss functions (loss function) and how to choose them Data and visualization
#numpy科学计算工具包
imp
', ' 110 ')
Replace
Data preprocessing
Sort the data
Df.sort_values (by=[' The number of messages sent by the customer on the Day '])
Sort
PivotTable report in data grouping --excel* * Group Customer chat Records
#如果price列的值 >3000,group column shows high, otherwise show low
df[' group ' = Np.where (df[' customer sends messages on the day '] > 5, ' High ', ' low ')
DF
Group
grouping to meet multiple criteria
# >24 in sign column with broker-level A1 and broker response length shown as 1
df
First, the knowledge of the prior detailedSpark SQL is important in that the operation Dataframe,dataframe itself provides save and load operations.Load: You can create Dataframe,Save: Saves the data in the Dataframe to a file, or to a specific format, indicating the type of file we want to read and what type of file w
First, introduce
Data mining needs data often distributed in different datasets, and data integration is the process of merging multiple datasets into a consistent data store.
For Dataframe, its connections are sometimes indexed.
Third, code example
# coding:utf-8 # In[2]: From pandas import dataframe import pandas as PD import NumPy as NP # #dataframe合并 #1 DF
Pandas Foundation
Stream Processing
Stream processing, sounds very tall, ah, in fact, is the block read. There are so many cases, there is a very large number of G files, no way to deal with, then the batch processing, processing 1 million lines at a time, and then deal with the next 1 million lines, slowly always can be processed.
# using a similar iterator approach
data=pd.read_csv (file, chunksize=1000000) for
sub_df in data:
print (' does something in SUB_DF Here ')
Index
Series and
merage#Pandas provides a method Merge (left, right, how= ' inner ', On=none, Left_on=none, Right_on=none, left_index=false, Right_index=false, sort= True, suffixes= (' _x ', ' _y '), Copy=true, Indicator=false)As a fully functional and powerful language, the merge () in Python's pandas library supports a variety of internal and external connections.
Left and right: two different dataframe
How: Refers to the way of merging (conne
TurnThe same lesson is reproduced from the great God. The sample code will be incrementally added in the future.PandasPandas is a numpy-based tool that was created to solve the data analysis task. Pandas incorporates a number of libraries and a number of standard data models, providing the tools needed to efficiently manipulate large datasets. Pandas provides a number of functions and methods that enable us to process data quickly and easily.>>> from pandas import Series,
This is a short introduction to pandas and geared mainly for new users.
Customarily, we import as follows
In [1]: Import pandas as PD in
[2]: Import NumPy as NP
Object Creation
The Data Structure Intro section
Creating a Series by passing a list of values, letting pandas create a default integer index
In [3]: s = PD. Series ([1,3,5,np.nan,6,8]) in
[4]: S
out[4]:
0 1 1 3 2 5 3 nan
4 6
5 8
Dtype:float64
Creating a dataframe by pass
Ten Minutes to Pandas
This is a short introduction to pandas and geared mainly for new users. You can have a complex recipes in the cookbook
Customarily, we import as follows
In [1]: Import pandas as PD in
[2]: Import NumPy as NP in
[3]: Import Matplotlib.pyplot as Plt
Object Creation
The Data Structure Intro section
Creating a Series by passing a list of values, letting pandas create a default integer index
In [4]: s = PD. Series ([1,3,5,np.nan,6,8]) in
[5]: S
out[5]:
0 1
1 3
This example describes the drawing of Python. Share to everyone for your reference, as follows:
1. Add data labels to graphics
Plt.plot (Datat.index,datat) Plt.xlabel (' index ', fontsize=15) plt.legend ([' T_bottom ', ' t_top '],loc = ' upper_right ', FontSize = ten) plt.show ()
2. Place the label on the far right
Plt.legend (bbox_to_anchor= (1.05, 1), loc=2, Borderaxespad=0.)
3, Display Chinese fonts
concil_set:if each in ans_attend_set:c Oncil_attend_set.add (each) elif each of Ans_notatt_set:concil_notatt_set.add (each) else:concil_n Otans_set.add (each) #3. Display result Def disp (SS, cap, num = True): #ss: List set #cap: Opening description print (Cap, ' ({}) '. Format (len (ss))) for I in rangE (Np.ceil (LEN (ss)/5). Astype (int)): Pre = i * 5 NEX = (i+1) * 5 #调整显示格式 dd = ' for Each in list (ss) [Pre:nex]: If Len (each) = = 2:DD = dd + "+ each Elif len" (ea ch) = = 3:DD = dd + ' + eac
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.