Structured streaming provides some APIs to manage streaming objects. These APIs allow users to manually manage the streaming that have been started, ensuring that the streaming in the system is executed in an orderly manner.
1. Streamingquery
A Streamingquery object is returned after the start streaming of the Datastreamwriter method is called. So the user can manage the streaming through this object.
As shown below:
Val query = Df.writeStream.format ("console"). Start ()//Get the Query object Query.id//Get the unique identifier of the running query that persists across restarts from checkpoint data Query.runid//Get the unique ID of this run of the query, which'll be generated at every Start/restart Query.name//Get the name of the auto-generated or user-specified name Query.explain ()//print detailed explanations of the query Query.stop ()//Stop the query Query.awaittermination ()//block until query is terminated, with stop () or with error Query.exception//The exception if the query has a been terminated with error Query.recentprogress//An array of the most recent progress updates for this query Query.lastprogress//The most recent progress update of this streaming query |
2. Streamingquerymanager
Structured streaming provides another interface for managing streaming: Streamingquerymanager. The user can get through the streams method of the Sparksession object.
As shown below:
Val spark:sparksession = ... Val Streammanager = Spark.streams () Streammanager.active//Get the list of currently active streaming queries Streammanager.get (ID)//Get a Query object by its unique ID Streammanager.awaitanytermination ()//block until any one of them terminates |
3. References
[1].structured streaming Programming Guide.
[2]. Kafka IntegrationGuide.
Spark Structured streaming Framework (5) Process Management