DataFrame API Application Case

Source: Internet
Author: User
Tags random seed

DataFrame API1, collect and Collectaslist

, collect returns an array that contains all rows in the Dataframe

Collectaslist Returns a Java list that contains all rows contained in the Dataframe

  

  

2. Count

Returns the number of rows Dataframe

  

3. First

Returns the first row

  

4. Head

Head method without parameters, returning the first row of Dataframe. When the parameter n is specified, the previous rows are returned

   

5. Show

Show with no parameters, display the first 20 rows, specify the number of n, display n

  

6. Take

Returns the top n rows of Dataframe

  

7. Cache

Cache the Dataframe in memory

8, Columns

Returns all column names of Dataframe as an array

  

9, Dtypes

Returns all column names of Dataframe and their corresponding data types as an array

  

10, explain

For debugging purposes, with no parameters, only the physical plan of Dataframe is printed to the console; When the specified parameter extended true, all plans are printed to the console, including the physical plan, the logical plan

  

11, IsLocal

Returns true if the collect and take methods are run locally

  

12, Printschema

To print Dataframe schema information to the console in a tree-shaped structure

  

13, Registertemptable

Register Dataframe as a temporary table of the specified name

     

14. Schema

Returns the schema information for the dataframe, corresponding to the type Structtype

  

15, TODF

A TODF with no parameters returns the new Dataframe when the parameter with the string array is returned, and the Dataframe renames the column names

  

16, Agg

Provide Dataframe with statistical operations that do not need to be groups

  

17. Apply

Returns a column of Dataframe, based on the specified column name, of type columns

    

18, as

Creating Dataframe with aliases

19, distinct

Returns the dataframe of the rows to Dataframe

  

20, except

Returns the Dataframe that contains rows for the current frame, while the rows are not in the other frame. Equivalent to two dataframe to do subtraction

  

21, explode

Returns a new dataframe in which the original column is expanded into 0 or more rows by the specified function

    

  

22. Filter

Conditional filtering of SQL expressions specified by parameters Dataframe

  

23, GroupBy

Group with one or more specified queue Dataframe to perform aggregation operations on them

   

24, Intersect

Fetch rows that exist at the same time in two dataframe and return dataframe

  

25. Join

      

26. Limit

Returns the top n rows of a Dataframe

  

27. And sort

Sort by a specified column or columns, supporting the argument list of a string or column, respectively

  

  

28. Sample

The rows of Dataframe are sampled by the specified factor, and if the specified withreplacement is true, the substitution is made with the specified seed or random seed.

  

29. Select

Select the specified column from the Dataframe, return dataframe, specify the column in three ways, you can specify it with the repeating parameter of the column name string, or the column repeating parameter and the multiple parameters of the columns name expression

  

  

30, UnionAll

Union caller and parameter these two dataframe rows

    

31, Withcolumn and withcolumnrenamed

Operations on the Dataframe column, withcolumn Adding column information, withcolumnrenamed renaming the column

  

32. Save

Save to the specified path

  

  

33, Saveasparquetfile

Save to the specified path where the data source is parquet

  

  

34, FlatMap

The rows in Dataframe are processed and the results are processed

  

35. foreach

    

36. Map and Mappartitions

Map maps the row of Dataframe to the R instance by the specified function parameter and returns an RDD instance of the type R as the element. Mappartitions Similar

  

37, repartition

Returns a dataframe that dataframe the original dataframe by the specified numpartitions

  

38, ToJSON

Returns the contents of the Dataframe with an RDD containing a JSON string

  

39, Queryexecution

Returns the query execution statement for Dataframe, which contains the logical plan and the physical plan

  

DataFrame API Application Case

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.