Sparkcontext and Rdd

Source: Internet
Author: User

Sparkcontext.scala implements a Sparkcontext class and Object,sparkcontext spark-like portals that connect spark clusters, create RDD, accumulate amounts, and broadcast volumes.

In the spark framework, the class is loaded only once in a JVM. In the stage of loading classes, the properties, code blocks, and functions defined in the Sparkcontext class are loaded.

(1) class Sparkcontext (config:sparkconf) extends Logging with Executoallocationclient, The default construction parameters for class Sparkcontext are sparkconf types, Sparkcontext inherits logging, and executoallocationclient trait, and multiple trait inheritance takes a with connection. Trait does not have any class parameters, the method called by trait is dynamically bound.

(2) Private Val Creationsite:callsite=utils.getcallsite ()

Val Starttime=syatem.currenttimemillis ()

1. Non-private variables: Fields declared with Val, only public getter methods (getter and setter are represented as creationsite= and creationsite_=),

The getter and setter methods are public, as are the fields declared with Var.

2. Add private variables: The getter or setter method of the relative Val and var declarations becomes private

(3): Private[spark] Val stopped:atomicboolean=new Atomicboolean (False)

Private[class_name] Specifies the class that can access the field, Class_name must be the currently defined class, or the outer class of the currently defined class, generating getter and setter methods. Private[this]: Objects that are private on a class-private basis are visible only in the same object

(4): Private Def assertnotstopped (): Unit--The method is a procedure because the return value is Unit, and the private method for the class

(5): def this () =this (New sparkconf ()) main constructor of the Sparkcontext class, the default parameter is the parameter of the sparkconf type

def this (Config:sparkconf,preferrednodelocationdata:map[string,set[splitinfo]) defines the need to call the This (config) Hyper method first

(6): Private[spark] def this (master:string,appname:string) private construction method of the Spark class

(7) @volatile private Var _dagscheduler:dagscheduler=_

private Var _applicationid:string=_

@volatile Note that, through the compiler, the annotated variable will be used by multiple threads, which will be instantiated when the class is loaded

(8): In the try{}catch{} code block----The various conditional statements, the initial value of the property, using master to create the corresponding parameters such as Taskschedule

(9): Private[spark] def Withscope[u] (body:=>u): U=rddoperationscop.withscope[u] (this) (body)

where u represents a type, such as a custom class or a Scala-native class, the body points to operation, a snippet of code, and the function is used in multiple places in the Sparkcontext class.

(Ten)): Def Newapihadoopfile[k, V, F <: Newinputformat[k, V]] (
Path:string,
FCLASS:CLASS[F],
Kclass:class[k],
VCLASS:CLASS[V],
Conf:configuration = hadoopconfiguration): rdd[(K, V)])

Function declaration Description: Call Newapihadoopfile[longwritable,text,textinputformat] ("Hdfs://ip:port/path/to/file")

Path: file to be read; Conf:hadoop configuration file; fclass:inputformat input data format; Kclass: type of key input format; Vclass: type of value for input format

(one): Def Sequencefile[k, V]
(path:string, minpartitions:int = defaultminpartitions)
(Implicit km:classtag[k], vm:classtag[v],
KCF: () = Writableconverter[k], VCF: () = Writableconverter[v]): rdd[(K, V)])
The function has default parameter settings, and an implicit conversion, the curry function

(Createtaskscheduler): Create Task Scheduler

(): def stop () turns off sparkcontext;object sparkmasterregex for pattern matching; class Writablefactory and Object Writablefactory contain implicit factory operations, implicit def Longwritablefactory:writablefactory[long] implicit operation

RDD abstract class Abstract,extends Serializable with Logging
(1): The functions and properties of the final label are not writable
(2): Override is required for subclasses of the inherited abstract class to overwrite methods in the parent class
The RDD abstract class is inherited by other RDD classes, such as Hadooprdd, to override the methods of the parent class in subclasses to apply to their own various RDD operations
Sorting, map,reduce operation, etc.

A map is an immutable collection and cannot be increased or reduced.

Val Person=map ("Spark"->6, "Hadoop"->12)

This definition is not allowed to increase the reduction

Val Person=scala.collection.mutable.map ("Spark"->6, "Hadoop"->12)

This can add elements, such as:

person+= ("File"->5)

You can also subtract elements, such as:

person-= "File"

Sparkcontext and Rdd

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.