Test of Spark SQL1.2 and spark SQL1.3

Source: Internet
Author: User

Spark1.2

1. Text Import

Create the form of an RDD, test txt text

master=spark://master:7077

./bin/spark-shell

scala> val sqlcontext = new Org.apache.spark.sql.SQLContext (SC) sqlContext:org.apache.spark.sql.SQLContext = [email  protected]scala> Import Sqlcontext.createschemardd Import sqlcontext.createschemarddscala> case class Person (name:string, age:int) defined class personscala> val people = sc.textfile ("/user/p.txt"). Map (_.split (",")). M The AP (p = = person (P (0), p (1). Trim.toint)) 15/05/05 06:30:35 INFO storage. Memorystore:ensurefreespace (190122) called with curmem=0, maxmem=27830255615/05/05 06:30:35 INFO storage. Memorystore:block broadcast_0 stored as values in memory (estimated size 185.7 KB, free 265.2 MB) 15/05/05 06:30:35 INFO s Torage. Memorystore:ensurefreespace (29581) called with curmem=190122, maxmem=27830255615/05/05 06:30:35 INFO storage. Memorystore:block broadcast_0_piece0 stored as bytes in memory (estimated size 28.9 KB, free 265.2 MB) 15/05/05 06:30:35 I NFO storage. Blockmanagerinfo:added Broadcast_0_piece0 in Memory on localhost:59627 (size:28.9 KB, free:265.4 MB) 15/05/05 06:30:35 INFO storage. blockmanagermaster:updated info of block broadcast_0_piece015/05/05 06:30:35 info Spark. Defaultexecutioncontext:created broadcast 0 from Textfile at <console>:17people:org.apache.spark.rdd.rdd[  Person] = mappedrdd[3 in map at <console>:17scala> people.registertemptable ("People") scala> val teenagers =  Sqlcontext.sql ("Select name from people WHERE the age >= 3 and The Age <=") Teenagers:org.apache.spark.sql.SchemaRDD = SCHEMARDD[6] at the RDD at schemardd.scala:108== Query plan = = Physical Plan ==project [name#0] Filter ((age#1 >= 3) &am p;& (age#1 <=)) Physicalrdd [name#0,age#1], mappartitionsrdd[4] at mappartitions at existingrdd.scala:36scala& Gt Teenagers.map (t = "Name:" + t (0)). Collect (). foreach (println) 15/05/05 06:31:18 WARN shortcircuit. Domainsocketfactory:the short-circuit local reads feature cannot be used because Libhadoop cannot be loaded.15/05/05 06:3 1:18 INFO mapred. Fileinputformat:total Input paths to process:115/05/05 06:31:18 INFO Spark. Defaultexecutioncontext:starting job:collect at <console>:1815/05/05 06:31:18 INFO Scheduler. Dagscheduler:got Job 0 (collect at <console>:18) with 2 output partitions (allowlocal=false) 15/05/05 06:31:18 INFO Scheduler. Dagscheduler:final Stage:stage 0 (collect at <console>:18) 15/05/05 06:31:18 INFO Scheduler. Dagscheduler:parents of Final stage:list () 15/05/05 06:31:18 INFO Scheduler. Dagscheduler:missing parents:list () 15/05/05 06:31:18 INFO Scheduler. Dagscheduler:submitting Stage 0 (mappedrdd[7] at map at <console>:18), which have no missing parents15/05/05 8 INFO Storage. Memorystore:ensurefreespace (6400) called with curmem=219703, maxmem=27830255615/05/05 06:31:18 INFO storage. Memorystore:block broadcast_1 stored as values in memory (estimated size 6.3 KB, free 265.2 MB) 15/05/05 06:31:18 INFO sto Rage. Memorystore:ensurefreespace (4278) called with curmem=226103, maxmem=27830255615/05/05 06:31:INFO storage. Memorystore:block broadcast_1_piece0 stored as bytes in memory (estimated size 4.2 KB, free 265.2 MB) 15/05/05 06:31:18 in FO storage. Blockmanagerinfo:added Broadcast_1_piece0 in Memory on localhost:59627 (size:4.2 KB, free:265.4 MB) 15/05/05 06:31:18 in FO storage. blockmanagermaster:updated info of block broadcast_1_piece015/05/05 06:31:18 info Spark. Defaultexecutioncontext:created broadcast 1 from broadcast at DAGSCHEDULER.SCALA:83815/05/05 06:31:18 INFO Scheduler. Dagscheduler:submitting 2 missing tasks from Stage 0 (Mappedrdd[7] at map at <console>:18) 15/05/05 06:31:18 INFO SC Heduler. Taskschedulerimpl:adding task set 0.0 with 2 tasks15/05/05 06:31:18 INFO Scheduler. Tasksetmanager:starting task 0.0 in stage 0.0 (TID 0, localhost, any, 1293 bytes) 15/05/05 06:31:18 INFO Scheduler. Tasksetmanager:starting Task 1.0 in Stage 0.0 (TID 1, localhost, any, 1293 bytes) 15/05/05 06:31:18 INFO executor. Executor:running Task 1.0 in Stage 0.0 (TID 1) 15/05/05 06:31:18 INFO executor. Executor:running task 0.0 in stage 0.0 (TID 0) 15/05/05 06:31:18 INFO Rdd. Hadooprdd:input split:hdfs://master:8020/user/p.txt:15+1515/05/05 06:31:18 INFO rdd. Hadooprdd:input split:hdfs://master:8020/user/p.txt:0+1515/05/05 06:31:19 INFO executor. executor:finished task 0.0 in stage 0.0 (TID 0). 1755 bytes result sent to driver15/05/05 06:31:19 INFO executor. Executor:finished Task 1.0 in Stage 0.0 (TID 1). 1733 bytes result sent to driver15/05/05 06:31:19 INFO Scheduler. Tasksetmanager:finished Task 1.0 in Stage 0.0 (TID 1) in the MS on localhost () 15/05/05 06:31:19 INFO Scheduler. Dagscheduler:stage 0 (collect at <console>:18) finished in 0.782 s15/05/05 06:31:19 INFO Scheduler. tasksetmanager:finished task 0.0 in stage 0.0 (TID 0) in 772 MS on localhost (2/2) 15/05/05 06:31:19 INFO Scheduler. taskschedulerimpl:removed TaskSet 0.0, whose tasks has all completed, from pool 15/05/05 06:31:19 INFO Scheduler. Dagscheduler:job 0 finished:collect at <console>:Took 0.860763 SName:kangName:wuName:liuName:zhang 

Spark SQL supported import JSON format, save for later test use, refer to Here

---------------------------------------------Ornate Split Line----------------------------------------------------------------------- ------------

Continue research and test spark SQL support for relational databases

The data source API provides pluggable mechanisms to access structured data through spark SQL. Data sources not only have an easy way to transform data and join the Spark platform.

Using data sources as simple as accessing them via SQL (or your favorite spark language)

CREATE Temporary TABLE episodesusing com.databricks.spark.avroOPTIONS (path "Episodes.avro")

Another advantage of the data source API is that users can manipulate the data in all languages supported by spark, regardless of the source of the data. For example, those data sources that are implemented in Scala can be used by pyspark users without the need for additional library developers to do any extra work. In addition, Spark SQL makes it easy to access data from different data sources using a single interface.

Spark SQL1.3

Test of Spark SQL1.2 and spark SQL1.3

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.