DataFrame API1, collect and Collectaslist
, collect returns an array that contains all rows in the Dataframe
Collectaslist Returns a Java list that contains all rows contained in the Dataframe
2. Count
Returns the number of rows Dataframe
3. First
Returns the first row
4. Head
Head method without parameters, returning the first row of Dataframe. When the parameter n is specified, the previous rows are returned
5. Show
Show with no parameters, display the first 20 rows, specify the number of n, display n
6. Take
Returns the top n rows of Dataframe
7. Cache
Cache the Dataframe in memory
8, Columns
Returns all column names of Dataframe as an array
9, Dtypes
Returns all column names of Dataframe and their corresponding data types as an array
10, explain
For debugging purposes, with no parameters, only the physical plan of Dataframe is printed to the console; When the specified parameter extended true, all plans are printed to the console, including the physical plan, the logical plan
11, IsLocal
Returns true if the collect and take methods are run locally
12, Printschema
To print Dataframe schema information to the console in a tree-shaped structure
13, Registertemptable
Register Dataframe as a temporary table of the specified name
14. Schema
Returns the schema information for the dataframe, corresponding to the type Structtype
15, TODF
A TODF with no parameters returns the new Dataframe when the parameter with the string array is returned, and the Dataframe renames the column names
16, Agg
Provide Dataframe with statistical operations that do not need to be groups
17. Apply
Returns a column of Dataframe, based on the specified column name, of type columns
18, as
Creating Dataframe with aliases
19, distinct
Returns the dataframe of the rows to Dataframe
20, except
Returns the Dataframe that contains rows for the current frame, while the rows are not in the other frame. Equivalent to two dataframe to do subtraction
21, explode
Returns a new dataframe in which the original column is expanded into 0 or more rows by the specified function
22. Filter
Conditional filtering of SQL expressions specified by parameters Dataframe
23, GroupBy
Group with one or more specified queue Dataframe to perform aggregation operations on them
24, Intersect
Fetch rows that exist at the same time in two dataframe and return dataframe
25. Join
26. Limit
Returns the top n rows of a Dataframe
27. And sort
Sort by a specified column or columns, supporting the argument list of a string or column, respectively
28. Sample
The rows of Dataframe are sampled by the specified factor, and if the specified withreplacement is true, the substitution is made with the specified seed or random seed.
29. Select
Select the specified column from the Dataframe, return dataframe, specify the column in three ways, you can specify it with the repeating parameter of the column name string, or the column repeating parameter and the multiple parameters of the columns name expression
30, UnionAll
Union caller and parameter these two dataframe rows
31, Withcolumn and withcolumnrenamed
Operations on the Dataframe column, withcolumn Adding column information, withcolumnrenamed renaming the column
32. Save
Save to the specified path
33, Saveasparquetfile
Save to the specified path where the data source is parquet
34, FlatMap
The rows in Dataframe are processed and the results are processed
35. foreach
36. Map and Mappartitions
Map maps the row of Dataframe to the R instance by the specified function parameter and returns an RDD instance of the type R as the element. Mappartitions Similar
37, repartition
Returns a dataframe that dataframe the original dataframe by the specified numpartitions
38, ToJSON
Returns the contents of the Dataframe with an RDD containing a JSON string
39, Queryexecution
Returns the query execution statement for Dataframe, which contains the logical plan and the physical plan
DataFrame API Application Case