Sparkcontext and Rdd

Last Update:2016-01-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sparkcontext.scala implements a Sparkcontext class and Object,sparkcontext spark-like portals that connect spark clusters, create RDD, accumulate amounts, and broadcast volumes.

In the spark framework, the class is loaded only once in a JVM. In the stage of loading classes, the properties, code blocks, and functions defined in the Sparkcontext class are loaded.

(1) class Sparkcontext (config:sparkconf) extends Logging with Executoallocationclient, The default construction parameters for class Sparkcontext are sparkconf types, Sparkcontext inherits logging, and executoallocationclient trait, and multiple trait inheritance takes a with connection. Trait does not have any class parameters, the method called by trait is dynamically bound.

(2) Private Val Creationsite:callsite=utils.getcallsite ()

Val Starttime=syatem.currenttimemillis ()

1. Non-private variables: Fields declared with Val, only public getter methods (getter and setter are represented as creationsite= and creationsite_=),

The getter and setter methods are public, as are the fields declared with Var.

2. Add private variables: The getter or setter method of the relative Val and var declarations becomes private

(3): Private[spark] Val stopped:atomicboolean=new Atomicboolean (False)

Private[class_name] Specifies the class that can access the field, Class_name must be the currently defined class, or the outer class of the currently defined class, generating getter and setter methods. Private[this]: Objects that are private on a class-private basis are visible only in the same object

(4): Private Def assertnotstopped (): Unit--The method is a procedure because the return value is Unit, and the private method for the class

(5): def this () =this (New sparkconf ()) main constructor of the Sparkcontext class, the default parameter is the parameter of the sparkconf type

def this (Config:sparkconf,preferrednodelocationdata:map[string,set[splitinfo]) defines the need to call the This (config) Hyper method first

(6): Private[spark] def this (master:string,appname:string) private construction method of the Spark class

(7) @volatile private Var _dagscheduler:dagscheduler=_

private Var _applicationid:string=_

@volatile Note that, through the compiler, the annotated variable will be used by multiple threads, which will be instantiated when the class is loaded

(8): In the try{}catch{} code block----The various conditional statements, the initial value of the property, using master to create the corresponding parameters such as Taskschedule

(9): Private[spark] def Withscope[u] (body:=>u): U=rddoperationscop.withscope[u] (this) (body)

where u represents a type, such as a custom class or a Scala-native class, the body points to operation, a snippet of code, and the function is used in multiple places in the Sparkcontext class.

(Ten)): Def Newapihadoopfile[k, V, F <: Newinputformat[k, V]] (
Path:string,
FCLASS:CLASS[F],
Kclass:class[k],
VCLASS:CLASS[V],
Conf:configuration = hadoopconfiguration): rdd[(K, V)])

Function declaration Description: Call Newapihadoopfile[longwritable,text,textinputformat] ("Hdfs://ip:port/path/to/file")

Path: file to be read; Conf:hadoop configuration file; fclass:inputformat input data format; Kclass: type of key input format; Vclass: type of value for input format

(one): Def Sequencefile[k, V]
(path:string, minpartitions:int = defaultminpartitions)
(Implicit km:classtag[k], vm:classtag[v],
KCF: () = Writableconverter[k], VCF: () = Writableconverter[v]): rdd[(K, V)])
The function has default parameter settings, and an implicit conversion, the curry function

(Createtaskscheduler): Create Task Scheduler

(): def stop () turns off sparkcontext;object sparkmasterregex for pattern matching; class Writablefactory and Object Writablefactory contain implicit factory operations, implicit def Longwritablefactory:writablefactory[long] implicit operation

RDD abstract class Abstract,extends Serializable with Logging
(1): The functions and properties of the final label are not writable
(2): Override is required for subclasses of the inherited abstract class to overwrite methods in the parent class
The RDD abstract class is inherited by other RDD classes, such as Hadooprdd, to override the methods of the parent class in subclasses to apply to their own various RDD operations
Sorting, map,reduce operation, etc.

A map is an immutable collection and cannot be increased or reduced.

Val Person=map ("Spark"->6, "Hadoop"->12)

This definition is not allowed to increase the reduction

Val Person=scala.collection.mutable.map ("Spark"->6, "Hadoop"->12)

This can add elements, such as:

person+= ("File"->5)

You can also subtract elements, such as:

person-= "File"

Sparkcontext and Rdd

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sparkcontext and Rdd

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sparkcontext and Rdd

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support