This article mainly introduces you to the pandas in Python. Dataframe to exclude specific lines of the method, the text gives a detailed example code, I believe that everyone's understanding and learning has a certain reference value, the need for friends to see together below. When you use Python for data analysis, one of the most frequently used structures is the dataframe of pandas, about pandas in Pytho
background
Items
Pandas
Spark
Working style
Stand-alone, unable to process large amounts of data
Distributed, capable of processing large amounts of data
Storage mode
Stand-alone cache
Can call Persist/cache distributed cache
is variable
Is
Whether
Index indexes
Automatically created
No index
Row structure
Pandas.series
Pyspark.sql.Row
Column structure
Pa
Pandas dataframe the additions and deletions of the summary series of articles:
How to create Pandas Daframe
Query method of Pandas Dataframe
Pandas Dataframe method for deleting rows or columns
Modification method of Pandas Dataframe
In this article we continue to introduce the relevant opera
How do I delete the list hollow character?Easiest way: New_list = [x for x in Li if x! = ']This section mainly learns the basic operations of pandas based on the previous two data structures.设有DataFrame结果的数据a如下所示: a b cone 4 1 1two 6 2 0three 6 1 6
First, view the data (the method of viewing the object is also applicable for series)1. View Dataframe before XX line or after XX line
When viewing dataframe information, you can view the data in Dataframe by Collect (), show (), or take (), which contains the option to limit the number of rows returned.
1. View the number of rows
You can use the count () method to view the number of dataframe rows
From pyspark.sql import sparksession
spark= sparksession\
. Builder \.
How do I delete the list hollow character?
Easiest way: New_list = [x for x in Li if x! = ']
Today is number No. 5.1.
This section mainly learns the basic operations of pandas based on the previous two data structures.
Data A with dataframe results is shown below: a b cone 4 1 1two 6 2 0three 6 1 6
First, view the data (the method of viewing the object is also applicable for series)
1. View
'); Pd.read_excel (' foo.xlsx ', ' Sheet1 ', Index_col=none, na_values=[' na ']) #写入读取excel数据, Pd.read_ The data read by Excel is stored in dataframe form (' Foo.h5 ', ' df ');pd. READ_HDF (' foo.h5 ', ' df ') #写入读取HDF5数据
8) Aggregate data using pandas (like group by or having in SQL):
data_obj[' User ID '].groupby (data_obj[' branch-maintenance line ') data_obj.groupby (' Branch branch maintenance line ') [' User ID '] #上面的简单写法adsl_obj. GroupBy ('
[' col_sum ' = df.apply (lambda x:x.sum (), Axis=1)
Calculates the sum of each row's data and adds it to the end as a new row
df.loc[' row_sum ' = df.apply (lambda x:x.sum ())
Final data results:
A B C D E col_sum0 0.673092 0.230338-0.171681 0.312303-0.184813 0.8592381-0.504482-0.344286- 0.050845-0.811277-0.298181-2.0090712 0.542788 0.207708 0.651379-0.656214 0.507595 1.2532563-0.249410 0.131549-2.1984 80-0.437407 1.628228-1.125520row_sum 0.4619
), columns=['A', 'B', 'C', 'D', 'E'])
DataFrame data preview:
A B C D E0 0.673092 0.230338 -0.171681 0.312303 -0.1848131 -0.504482 -0.344286 -0.050845 -0.811277 -0.2981812 0.542788 0.207708 0.651379 -0.656214 0.5075953 -0.249410 0.131549 -2.198480 -0.437407 1.628228
Calculate the total data of each column and add it to the end as a new column
df['Col_sum'] = df.apply(lambda x: x.sum(), axis=1)
Calcula
["XX"] column, Df.withcolumn ("xx", 1). Show ()
Show
DF does not output specific content, output specific content with the Show methodOutput form: Dataframe[age:bigint, name:string]
DF Output Specific Content
Df.show () Output specific content
No tree structure output form
Print a summary in the form of a tree: Df.printschema ()
Df.collect ()
Sort
Df.sort_index () So
["XX"] column, Df.withcolumn ("xx", 1). Show ()
Show
DF does not output specific content, output specific content with the Show methodOutput form: Dataframe[age:bigint, name:string]
DF Output Specific Content
Df.show () Output specific content
No tree structure output form
Print a summary in the form of a tree: Df.printschema ()
Df.collect ()
Sort
Df.sort_index () So
Detailed reference official documents: http://matplotlib.org/api/axis_api.html
Classes for the ticks and x and Y axis
Inheritance
Inheritance diagram of Tick, ticker, Xaxis, YAxis, XTick
Axis
ObjectsClass Matplotlib.axis. Axis (axes, pickradius=15) public attributes Axes.transdata-transform data coords to display coords axes.transaxes- Transform
Http://www.pcviva.com/jixiejianpanshenmezhouhao.htmlIf you already know what is mechanical keyboard, also read the "Mechanical keyboard what makes good" this article, you may want to pick, is the axis of the mechanical keyboard. Mechanical Keyboard What axis good , we first look at the mechanical keyboard white axis, black ax
Label:This article explains the structured data processing of spark, including: Spark SQL, DataFrame, DataSet, and Spark SQL services. This article focuses on the structured data processing of the spark 1.6.x, but because of the rapid development of spark (the writing time of this article is when Spark 1.6.2 is released, and the preview version of Spark 2.0 has been published), please feel free to follow spark Official SQL documentation to get the lat
. Qcut () method to write a function that converts data values by interval: Pandas's Qcut () methoddef Convert_grades_curve (exam_grades): return pd.qcut (Exam_grades, [0, 0.1, 0.2, 0.5, 0.8, 1], labels=['E'D 'C'B'A '])And then apply this function to the entire dataframe.print grades_df.apply (convert_grades_curve) exam1 exam2andre F fbarry b bchris c Cdan c cemilio b BFred c Cgreta A ahumbert D DIvan
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.