1. Create a dataframe from a dictionary>>>ImportPandas>>> dict_a = {'user_id':['Webbang','Webbang','Webbang'],'book_id':['3713327','4074636','26873486'],'rating':['4','4','4'],'mark_date':['2017-03-07','2017-03-07','2017-03-07']}>>> df = Pandas. DataFrame (DICT_A)#Create a dataframe from a dictionary>>> DF#The created
introduces you about Python in pandas. Dataframe to exclude specific lines of the method, the text gives a detailed example code, I believe that everyone's understanding and learning has a certain reference value, the need for friends to see together below.
2. About pandas in Python. Dataframe add a new row and column
background
Items
Pandas
Spark
Working style
Stand-alone, unable to process large amounts of data
Distributed, capable of processing large amounts of data
Storage mode
Stand-alone cache
Can call Persist/cache distributed cache
is variable
Is
Whether
Index indexes
Automatically created
No
']df_obj[' user number '].isin (alist) #将要过滤的数据放入字典中, uses Isin to filter the data, returns the row index and the results of each row filter, and returns if the match is turedf_obj[df_obj[' user number '].isin (alist)] #获取匹配结果为ture的行Filter data using Dataframe blur (like in sql):df_obj[df_obj[' package '].str.contains (R '. * Voice cdma.* ')] #使用正则表达式进行模糊匹配, * match 0 or unlimited, match 0 or 1 timesData c
]]=1# the selected location data is replaced with 1
4) Use Dataframe to filter the data (like where in SQL):
Alist = [' 023-18996609823 ']df_obj[' user number '].isin (alist) #将要过滤的数据放入字典中, use Isin to filter the data, return the row index and the results of each row filter, and return if the match is Turedf_obj [df_obj[' User number '].isin (alist)] #获取匹配结果为ture的行
5) filter data using
conversions
CSV Data Set Read
Structured data file reads
HDF5 Read
JSON data Set Read
Excel reads
Hive Table Read
External database Read
Index indexes
Automatically created
There are no index indexes and you need to create additional columns if needed
Row structure
Series structure, belonging to the
conversions
CSV Data Set Read
Structured data file reads
HDF5 Read
JSON data Set Read
Excel reads
Hive Table Read
External database Read
Index indexes
Automatically created
There are no index indexes and you need to create additional columns if needed
Row structure
Series structure, belonging to the
How do I delete the list hollow character?Easiest way: New_list = [x for x in Li if x! = ']This section mainly learns the basic operations of pandas based on the previous two data structures.设有DataFrame结果的数据a如下所示: a b cone 4 1 1two 6 2 0three 6 1 6
First, view the data (the method of viewing the object is also applicable for series)1. View Dat
from:76713387How to iterate through rows in a DataFrame in pandas-dataframe by row iterationHttps://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandasHttp://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandasWhen it comes to manip
How do I delete the list hollow character?
Easiest way: New_list = [x for x in Li if x! = ']
Today is number No. 5.1.
This section mainly learns the basic operations of pandas based on the previous two data structures.
Data A with dataframe results is shown below: a b cone 4 1 1two 6 2 0three 6 1 6
First, view the data (the method of viewing the object is also applicable fo
in the sense this they ' re an immutable data structure. Therefore things like:
# to create a new column "three"
df[' three ') = Df[' One '] * df[' one ']
Can ' t exist, just because this kind of affectation goes against the principles of Spark. Another example would is trying to access by index a single element within a DataFrame. Don ' t forget that your ' re using a distributed data structure, not a i
), columns=['A', 'B', 'C', 'D', 'E'])
DataFrame data preview:
A B C D E0 0.673092 0.230338 -0.171681 0.312303 -0.1848131 -0.504482 -0.344286 -0.050845 -0.811277 -0.2981812 0.542788 0.207708 0.651379 -0.656214 0.5075953 -0.249410 0.131549 -2.198480 -0.437407 1.628228
Calculate the total data of each column and add it to the end as a new column
df['Col_sum'] = df.apply(lambda x: x.sum(), axis=1)
Calculates the total data of each row and adds it to
[' col_sum ' = df.apply (lambda x:x.sum (), Axis=1)
Calculates the sum of each row's data and adds it to the end as a new row
df.loc[' row_sum ' = df.apply (lambda x:x.sum ())
Final data results:
A B C D E col_sum0 0.673092 0.230338-0.171681 0.312303-0.184813 0.8592381-0.504482-0.344286- 0.050845-0.811277-0.298181-2.0090712 0.542788 0.207708 0.651379-0.656214 0.507595 1.2532563-0.249410 0.131549-2.1984 80-0.437407 1.628228-1.125520row_sum 0.461987 0.225310-1.769627-1.592595 1.652828-1.0220
This time to bring you pandas+dataframe to achieve the choice of row and slice operation, pandas+dataframe to achieve the row and column selection and the attention of the slicing operation, the following is the actual case, take a look.
Select in SQL is selected according to the name of the column,
Previously written pandas DataFrame Applymap () functionand pandas Array (pandas Series)-(5) Apply method Custom functionThe applymap () function of the pandas DataFrame and the apply () method of the
']], columns=['p1', 'p2 ...: ', 'p3'])In [4]: dfOut[4]: p1 p2 p30 GD GX FJ1 SD SX BJ2 HN HB AH3 HEN HEN HLJ4 SH TJ CQ
If you only want two rows whose p1 is GD and HN, you can do this:
In [8]: df[df.p1.isin(['GD', 'HN'])]Out[8]: p1 p2 p30 GD GX FJ2 HN HB AH
However, if we want data except the two rows, we need to bypass the point.
The principle is to first extract p1 and convert it to a list, then remove unnecessary rows (values) from the list, and then useisin()
In [9]: ex_list = list(df.p1)In [
-04-14 4 52013-04-15 1 2 182013-04-17 9 12013-04-18 7 17
Update: If there is no special requirement, it is highly recommended to use LOC with minimal use [], as Loc avoids chained indexing problems when Dataframe is re-assigned, using [] The compiler is likely to give settingwithcopy warnings.
See the official documentation for details: http://pandas.pydata.org/pandas-docs/stable/indexing.
1. In the dataframe of pandas, we often need to select a row for a specified condition based on a property, when the Isin method is particularly effective.
Import Pandas as Pddf = PD. DataFrame ([[1,2,3],[1,3,4],[2,4,3]],index = [' One ', ' both ', ' three '],columns = ['
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.