The "Spark" Sparksession API

Last Update:2018-05-18 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sparksession is a more important class, the implementation of its functions, it certainly contains a number of more functions, here is the description of what functions it contains.

Builder function
Public static Sparksession.builder Builder ()
Create Sparksession.builder, Initialize Sparksession.

setactivesession function
Public static void Setactivesession (Sparksession session)
When Sparksession.getorcreate () is called, sparksession changes, and a thread and its child threads are returned. This will determine whether a given thread accepts a sparksession with an isolated session, rather than a global context.

clearactivesession function
Public static void Clearactivesession ()
Clears the active sparksession of the current thread. Then calling Getorcreate will return the first created context instead of the local thread rewrite

setdefaultsession function
Public static void Setdefaultsession (Sparksession session)
Set the default sparksession to return to builder

cleardefaultsession function
Public static void Cleardefaultsession ()
Clears the default sparksession returned by the builder

getactivesession function
Public static Scala. Option<sparksession> getactivesession ()
By builder, returns the active sparksession of the current thread

getdefaultsession function

Public static Scala. Option<sparksession> getdefaultsession ()
By builder, return to the default sparksession

Sparkcontext function
Public Sparkcontext Sparkcontext ()

Version function
Public String version ()
Return to the spark version of the running application

Sharedstate function
Public org.apache.spark.sql.internal.SharedState sharedstate ()
Share status through sessions, including Sparkcontext, cached data, listener, and catalog.
This is internal spark, the interface stability is not guaranteed

sessionstate function
Public org.apache.spark.sql.internal.SessionState sessionstate ()
Through the session isolation state, including: SQL configuration, temporary table, registered function, and other acceptable sqlconf.
This is internal spark, the interface stability is not guaranteed

SqlContext function
Public SqlContext SqlContext ()
The session package is in the form of sqlcontext for backwards compatibility.

conf function
Public Runtimeconfig conf ()
Run the Spark Configuration interface
This interface allows users to set up and get all spark and Hadoop configurations related to spark SQL. When you get the config value,

Listenermanager function
Public Executionlistenermanager Listenermanager ()
An interface for registering custom queryexecutionlisteners to listen for execution metrics.

Experimental function
Public experimentalmethods Experimental ()
The collection function, which is considered a experimental, can be used to query the advanced features of the query scheduler.

UDF functions
Public udfregistration UDF ()
Collection functions for user-defined functions

Streams function
Public Streamingquerymanager streams ()
Returns Streamingquerymanager, allowing all Streamingquerys to be managed

NewSession function
Public sparksession newsession ()
Start a standalone SQL configuration, temporary table, registered function new session, but share the underlying sparkcontext and cached data.

Emptydataframe function
PublicDataset<Row> emptydataframe ()
Returns an empty dataframe with no rows and columns

Emptydataset function
Public <T> dataset<t> emptydataset (encoder<t> evidence$1)
Create an empty dataset of type T

Createdataframe function
Public <a extends Scala. Product> dataset<row> createdataframe (rdd<a> rdd,scala.reflect.api.typetags.typetag<a> EVIDENCE$2)
Create Dateframe from Rdd

Public dataset<row> Createdataframe (rdd<row> Rowrdd, structtype Schema)
Creates a dataframe from the schema that the RDD contains from the row given. You need to ensure that the RDD structure of each line matches the schema provided, otherwise the exception will be run.

Public dataset<row> Createdataframe (javardd<row> rowrdd,structtype Schema)
Create an RDD dataframe from the line containing the schema. Make sure that each line structure provided by the RDD matches the schema provided, otherwise the exception is run
Public dataset<row> Createdataframe (java.util.list<row> rows,structtype Schema)

Create a dataframe from the schema containing the line java.util.List

Public dataset<row> createdataframe (rdd<?> rdd,class<?> beanclass)
Application schema to the Java Beans RDD
Warning: Because fields in Java beans do not have a guaranteed order, the SELECT * Query returns columns in an undefined order.

Public dataset<row> Createdataframe (javardd<?> rdd, class<?> beanclass)
Application schema to the Java Beans RDD
Warning: Because fields in Java beans do not have a guaranteed order, the SELECT * Query returns columns in an undefined order.
Public dataset<row> createdataframe (java.util.list<?> data,class<?> beanClass)

Apply schema to Java Bean list
Warning: Because fields in Java beans do not have a guaranteed order, the SELECT * Query returns columns in an undefined order.

Baserelationtodataframe function
Public dataset<row> Baserelationtodataframe (baserelation baserelation)
Convert the created baserelation to an external data source to Dataframe

CreateDataSet function
Public <T> dataset<t> createdataset (scala.collection.seq<t> data,encoder<t> evidence$4)
Creates a dataset from a locally given type of data seq. This method requires encoder (converts a JVM object of type T to an internal spark SQL representation). This is usually created automatically by implicits from Sparksession. Or you can explicitly create it by calling a static method on encoders.

Public <T> dataset<t> CreateDataSet (rdd<t> data,encoder<t> evidence$5)
Creates a dataset from a given type of RDD. This method requires encoder (converts a JVM object of type T to an internal spark SQL representation). Implicits are usually created automatically by sparksession or by invoking a static method on encoders.

Public <T> dataset<t> CreateDataSet (java.util.list<t> data,encoder<t> evidence$6)
Creates a Dataset, for java.util.List of type T. This method requires encoder (converting a JVM object of type T to an internal spark SQL representation), or it can be created explicitly by invoking a static method on encoders.

Range function
Public dataset<long> Range (Long end)Create a DataSet with a single longtype column named ID that contains elements ranging from 0 to the end (not included) with a step value of 1.

Public dataset<long> Range (Long Start,long end)
Create a DataSet with a single longtype column named ID that contains the range of elements from start to end (not included) with a step value of 1.

Public dataset<long> Range (long start, long end, long step)
Create a DataSet with a single longtype column named ID that contains the range of elements from start to end (not included) and step value.

Public dataset<long> Range (Long start,long end,long step,int numpartitions)
Create a DataSet with a single longtype column named ID, containing the range of elements from start to end (not included), step value of step, specify number of partition

Catalog function
Public Catalog catalog ()
Users can create, drop, alter or query the underlying database, tables, functions, etc. through it.

Table function
Public dataset<row> Table (String tableName)Returns the specified Table/view as Dataframe
TableName is the name that can be qualified or unqualified. If specified in the database, it is recognized in the database. Otherwise it will try to find a temporary view that matches the Table/view of the current database, and the global Temporary database view is also valid.

SQL functions
Public dataset<row> SQL (String sqltext)
Executes a SQL query using spark, returning the result as a dataframe. Used for SQL parsing, can be configured with Spark.sql.dialect

Read function
Public Dataframereader Read ()
Returns a dataframereader that can be used to read non-streaming data as a dataframe

Readstream function
Public Datastreamreader Readstream ()
Returns a dataframereader that can be used to read stream data as a dataframe

Time function
Public <T> T Time (Scala. Function0<t> f)
Executes some code blocks and prints out the time it takes to execute the block. This is only available in Scala and is used primarily for interactive testing and debugging.
This function is still useful and can be used in many places.

implicits function
Public sparksession.implicits$ implicits ()
Nested Scala object access

Stop function
public void Stop ()
Stop Sparkcontext

Close function
public void Close ()
Similar to stop

The "Spark" Sparksession API

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

The "Spark" Sparksession API

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

The "Spark" Sparksession API

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support