Let's create a data frame by hand.[Python]View PlainCopy
Import NumPy as NP
Import Pandas as PD
DF = PD. DataFrame (Np.arange (0,2). Reshape (3), columns=list (' abc ' )
DF is such a dropSo how do you choose the three ways to pick the data?One, when each column already has column name, with DF [' a '] can choose to take out a whole column of data. If you know column names and index, and both are well-entered, you can choose.
I believe many people like me in the process of learning Python,pandas data selection and modification has a great deal of confusion (perhaps by the Matlab) impact ...
To this day finally completely figure out ...
Let's start with a data box manually.
Import NumPy as NP
import pandas as PD
DF = PD. Dataframe (Np.arange (0,60,2). Reshape (10,3), columns=list (' abc ')DF is such a drop
So what are the three ways to choose the data?
First, when column
Python array, list, And dataframe index slicing operations: July 22, July 19, 2016-zhi Lang document,Array, list, And dataframe index slicing operations: January 1, July 19, 2016-zhi Lang document
List, one-dimensional, two-dimensional array, datafrme, loc, iloc, and ix
Numpy array index and slice introduction:Starting from the basic list index, let's start with
This article mainly introduces you to the pandas in Python. Dataframe to exclude specific lines of the method, the text gives a detailed example code, I believe that everyone's understanding and learning has a certain reference value, the need for friends to see together below. When you use Python for data analysis, one of the most frequently used structures is the dataframe of pandas, about pandas in Pytho
-04-14 4 52013-04-15 1 2 182013-04-17 9 12013-04-18 7 17
Update: If there is no special requirement, it is highly recommended to use LOC with minimal use [], as Loc avoids chained indexing problems when Dataframe is re-assigned, using [] The compiler is likely to give settingwithcopy warnings.
See the official documentation for details: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
background
Items
Pandas
Spark
Working style
Stand-alone, unable to process large amounts of data
Distributed, capable of processing large amounts of data
Storage mode
Stand-alone cache
Can call Persist/cache distributed cache
is variable
Is
Whether
Index indexes
Automatically created
No index
Row structure
Pandas.series
Pyspark.sql.Row
Column structure
Pa
index-feature name-Attribute-easy to understand
2. filter the row and column data of dataframe
import pandas as pd,numpy as npfrom pandas import DataFramedf = DataFrame(np.arange(20).reshape((4,5)),column = list('abcde'))
1. df [] df. Select column data
Df.Df [['A', 'B']
2. df. loc [[index], [colunm] use tags to select data
When you do not filter rows, enter "(cannot be blank)" in "[index]", that is, "df
Ming 6.0 - Name:price, Dtype:float64 -Zhang San 1.2 theReese 1.0 -Harry 2.3 -Chen Jiu 5.0 -Xiao Ming 6.0 +Name:price, Dtype:float64 In general, we often need to value by column, then Dataframe provides loc and Iloc for everyone to choose from, but the difference is between the two.1 Print(frame2)2 Print(frame2.loc['Harry'])#Loc can use the index of the string type, whereas the
from:76713387How to iterate through rows in a DataFrame in pandas-dataframe by row iterationHttps://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandasHttp://stackoverflow.com/questions/7837722/what-is-the-most-efficient-way-to-loop-through-dataframes-with-pandasWhen it comes to manipulating
This time to bring you pandas+dataframe to achieve the choice of row and slice operation, pandas+dataframe to achieve the row and column selection and the attention of the slicing operation, the following is the actual case, take a look.
Select in SQL is selected according to the name of the column, pandas is more flexible, not only can be selected according to the column name, but also according to the co
When viewing dataframe information, you can view the data in Dataframe by Collect (), show (), or take (), which contains the option to limit the number of rows returned.
1. View the number of rows
You can use the count () method to view the number of dataframe rows
From pyspark.sql import sparksession
spark= sparksession\
. Builder \.
How do I delete the list hollow character?Easiest way: New_list = [x for x in Li if x! = ']This section mainly learns the basic operations of pandas based on the previous two data structures.设有DataFrame结果的数据a如下所示: a b cone 4 1 1two 6 2 0three 6 1 6
First, view the data (the method of viewing the object is also applicable for series)1. View Dataframe before XX line or after XX line
How do I delete the list hollow character?
Easiest way: New_list = [x for x in Li if x! = ']
Today is number No. 5.1.
This section mainly learns the basic operations of pandas based on the previous two data structures.
Data A with dataframe results is shown below: a b cone 4 1 1two 6 2 0three 6 1 6
First, view the data (the method of viewing the object is also applicable for series)
1. View
Array,list,dataframe Index Tile Operation July 19, 2016--smart wave documentA simple discussion on list, one-dimensional, two-dimensional array,datafrme,loc, Iloc and IXNumPy an array of indexes and tiles:Starting with the most basic list index, let's start with a code and result:a = [0,1,2,3,4,5,6,7,8,9] a[:5:-1] #step Output:[9, 8, 7, 6][][1, 0]List slice, in "[]" There are generally two ":" Delimiter,
Dataframe Data Filter--loc,iloc,ix,at,iat condition Filter Single condition filter Select a record with a value greater than N for the col1 column: data[data[' col1 ']>n] filters the col1 column for records with a value greater than N, but displays col2, Col3 column value: data[[' col2 ', ' col3 ']][data[' col1 ']>n] Select a specific row: Use the Isin function to filter records based on specific values. Fi
Label:This article explains the structured data processing of spark, including: Spark SQL, DataFrame, DataSet, and Spark SQL services. This article focuses on the structured data processing of the spark 1.6.x, but because of the rapid development of spark (the writing time of this article is when Spark 1.6.2 is released, and the preview version of Spark 2.0 has been published), please feel free to follow spark Official SQL documentation to get the lat
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.