This article mainly introduces you to the pandas in Python. Dataframe to exclude specific lines of the method, the text gives a detailed example code, I believe that everyone's understanding and learning has a certain reference value, the need for friends to see together below. When you use Python for data analysis, one of the most frequently used structures is the data
background
Items
Pandas
Spark
Working style
Stand-alone, unable to process large amounts of data
Distributed, capable of processing large amounts of data
Storage mode
Stand-alone cache
Can call Persist/cache distributed cache
is variable
Is
Whether
Index indexes
Automatically created
No index
Row structure
Pandas.series
Pyspar
Pandas
Spark
Working style
Single machine tool, no parallel mechanism parallelismdoes not support Hadoop and handles large volumes of data with bottlenecks
Distributed parallel computing framework, built-in parallel mechanism parallelism, all data and operations are automatically distributed on each cluster node. Process distributed data in a way that handles in-memory data.Supports Hadoop and can handle large amounts of data
Pandas
Spark
Working style
Single machine tool, no parallel mechanism parallelismdoes not support Hadoop and handles large volumes of data with bottlenecks
Distributed parallel computing framework, built-in parallel mechanism parallelism, all data and operations are automatically distributed on each cluster node. Process distributed data in a way that handles in-memory data.Supports Hadoop and can handle large amounts of data
': 12, ' C2 ': 120}]DF = PD. DataFrame (INP) for index, row in df.iterrows (): print row[ ' C1 '], Row[ ' C2 ' #10 100 #11 110 #12 + Each iteration of Df.iterrows () is a tuple type that contains the index and the data for each row.
Using the Iterrows method, the resulting row is a series,dataframe dtypes will not be retained.
The returned series is only a copy of the original
in the sense this they ' re an immutable data structure. Therefore things like:
# to create a new column "three"
df[' three ') = Df[' One '] * df[' one ']
Can ' t exist, just because this kind of affectation goes against the principles of Spark. Another example would is trying to access by index a single element within a DataFrame. Don ' t forget that your ' re using a distributed data structure, not a i
']], columns=['p1', 'p2 ...: ', 'p3'])In [4]: dfOut[4]: p1 p2 p30 GD GX FJ1 SD SX BJ2 HN HB AH3 HEN HEN HLJ4 SH TJ CQ
If you only want two rows whose p1 is GD and HN, you can do this:
In [8]: df[df.p1.isin(['GD', 'HN'])]Out[8]: p1 p2 p30 GD GX FJ2 HN HB AH
However, if we want data except the two rows, we need to bypass the point.
The principle is to first extract p1 and convert it to a list, then remove unnecessary rows (values) from the
), columns=['A', 'B', 'C', 'D', 'E'])
DataFrame data preview:
A B C D E0 0.673092 0.230338 -0.171681 0.312303 -0.1848131 -0.504482 -0.344286 -0.050845 -0.811277 -0.2981812 0.542788 0.207708 0.651379 -0.656214 0.5075953 -0.249410 0.131549 -2.198480 -0.437407 1.628228
Calculate the total data of each column and add it to the end as a new column
df['Col_sum'] = df.apply(lambda x: x.sum(), axis=1)
Calculates the total data of each row and adds it to
Pandas. DataFrame
pandas. class
DataFrame
(data=none, index=none, columns=none, dtype=none, copy=false) [Source]
Two-dimensional size-mutable, potentially heterogeneous tabular data structure with labeled axes (rows and columns). Arithmetic operations align on both row and
[' col_sum ' = df.apply (lambda x:x.sum (), Axis=1)
Calculates the sum of each row's data and adds it to the end as a new row
df.loc[' row_sum ' = df.apply (lambda x:x.sum ())
Final data results:
A B C D E col_sum0 0.673092 0.230338-0.171681 0.312303-0.184813 0.8592381-0.504482-0.344286- 0.050845-0.811277-0.298181-2.0090712 0.542788 0.207708 0.651379-0.656214 0.507595 1.2532563-0.249410 0.131549-2.1984 80-0.437407 1.628228-1.125520row_sum 0.461987 0.225310-1.769627-1.592595 1.652828-1.0220
lines for GD and HN, you can do this:
In [8]: Df[df.p1.isin ([' GD ', ' HN '])]out[8]: p1 p2 p30 GD GX FJ2 HN HB AH
But if we want data beyond these two lines, we need to get around the point.
The principle is to first remove the P1 and convert it to a list, then remove the unwanted rows (values) from the list and then use them in the Dataframeisin()
In [9]: Ex_list =
How do I delete the list hollow character?Easiest way: New_list = [x for x in Li if x! = ']This section mainly learns the basic operations of pandas based on the previous two data structures.设有DataFrame结果的数据a如下所示: a b cone 4 1 1two 6 2 0three 6 1 6
First, view the data (the method of viewing the object is also applicable for series)1. V
This article mainly introduces pandas in python. the DataFrame method for excluding specific rows provides detailed sample code. I believe it has some reference value for everyone's understanding and learning. let's take a look at it. This article mainly introduces pandas in python. the DataFrame method for excluding s
How do I delete the list hollow character?
Easiest way: New_list = [x for x in Li if x! = ']
Today is number No. 5.1.
This section mainly learns the basic operations of pandas based on the previous two data structures.
Data A with dataframe results is shown below: a b cone 4 1 1two 6 2 0three 6 1 6
First, view the data (the metho
This article mainly introduces you to the pandas in Python. Dataframe to exclude specific lines of the method, the text gives a detailed example code, I believe that everyone's understanding and learning has a certain reference value, the need for friends to see together below.
Objective
When you use Python for data analysis, one of the most frequently used structures is the
Let's create a data frame by hand.[Python]View PlainCopy
Import NumPy as NP
Import Pandas as PD
DF = PD. DataFrame (Np.arange (0,2). Reshape (3), columns=list (' abc ' )
DF is such a dropSo how do you choose the three ways to pick the data?One, when each column already has column name, with DF
1. Create a dataframe from a dictionary>>>ImportPandas as PD>>> Dict1 = {'col1': [1,2,5,7],'col2':['a','b','C','D']}>>> DF =PD. DataFrame (Dict1)>>>DF col1 COL201a1 2b2 5C3 7 D2. Create Dataframe from multiple lists (convert the list
This time to bring you pandas in the Dataframe query what methods, pandas in the Dataframe query of what matters, the following is the actual case, together to see.
Pandas provides us with a variety of slicing methods, which are often confusing if you don't know them well.
index-feature name-Attribute-easy to understand
2. filter the row and column data of dataframe
import pandas as pd,numpy as npfrom pandas import DataFramedf = DataFrame(np.arange(20).reshape((4,5)),column = list('abcde'))
1. df [] df. Select column data
Df.Df [['A', 'B']
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.