Test of Spark SQL1.2 and spark SQL1.3

Last Update:2015-05-05 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Spark1.2

1. Text Import

Create the form of an RDD, test txt text

master=spark://master:7077

./bin/spark-shell

scala> val sqlcontext = new Org.apache.spark.sql.SQLContext (SC) sqlContext:org.apache.spark.sql.SQLContext = [email  protected]scala> Import Sqlcontext.createschemardd Import sqlcontext.createschemarddscala> case class Person (name:string, age:int) defined class personscala> val people = sc.textfile ("/user/p.txt"). Map (_.split (",")). M The AP (p = = person (P (0), p (1). Trim.toint)) 15/05/05 06:30:35 INFO storage. Memorystore:ensurefreespace (190122) called with curmem=0, maxmem=27830255615/05/05 06:30:35 INFO storage. Memorystore:block broadcast_0 stored as values in memory (estimated size 185.7 KB, free 265.2 MB) 15/05/05 06:30:35 INFO s Torage. Memorystore:ensurefreespace (29581) called with curmem=190122, maxmem=27830255615/05/05 06:30:35 INFO storage. Memorystore:block broadcast_0_piece0 stored as bytes in memory (estimated size 28.9 KB, free 265.2 MB) 15/05/05 06:30:35 I NFO storage. Blockmanagerinfo:added Broadcast_0_piece0 in Memory on localhost:59627 (size:28.9 KB, free:265.4 MB) 15/05/05 06:30:35 INFO storage. blockmanagermaster:updated info of block broadcast_0_piece015/05/05 06:30:35 info Spark. Defaultexecutioncontext:created broadcast 0 from Textfile at <console>:17people:org.apache.spark.rdd.rdd[  Person] = mappedrdd[3 in map at <console>:17scala> people.registertemptable ("People") scala> val teenagers =  Sqlcontext.sql ("Select name from people WHERE the age >= 3 and The Age <=") Teenagers:org.apache.spark.sql.SchemaRDD = SCHEMARDD[6] at the RDD at schemardd.scala:108== Query plan = = Physical Plan ==project [name#0] Filter ((age#1 >= 3) &am p;& (age#1 <=)) Physicalrdd [name#0,age#1], mappartitionsrdd[4] at mappartitions at existingrdd.scala:36scala& Gt Teenagers.map (t = "Name:" + t (0)). Collect (). foreach (println) 15/05/05 06:31:18 WARN shortcircuit. Domainsocketfactory:the short-circuit local reads feature cannot be used because Libhadoop cannot be loaded.15/05/05 06:3 1:18 INFO mapred. Fileinputformat:total Input paths to process:115/05/05 06:31:18 INFO Spark. Defaultexecutioncontext:starting job:collect at <console>:1815/05/05 06:31:18 INFO Scheduler. Dagscheduler:got Job 0 (collect at <console>:18) with 2 output partitions (allowlocal=false) 15/05/05 06:31:18 INFO Scheduler. Dagscheduler:final Stage:stage 0 (collect at <console>:18) 15/05/05 06:31:18 INFO Scheduler. Dagscheduler:parents of Final stage:list () 15/05/05 06:31:18 INFO Scheduler. Dagscheduler:missing parents:list () 15/05/05 06:31:18 INFO Scheduler. Dagscheduler:submitting Stage 0 (mappedrdd[7] at map at <console>:18), which have no missing parents15/05/05 8 INFO Storage. Memorystore:ensurefreespace (6400) called with curmem=219703, maxmem=27830255615/05/05 06:31:18 INFO storage. Memorystore:block broadcast_1 stored as values in memory (estimated size 6.3 KB, free 265.2 MB) 15/05/05 06:31:18 INFO sto Rage. Memorystore:ensurefreespace (4278) called with curmem=226103, maxmem=27830255615/05/05 06:31:INFO storage. Memorystore:block broadcast_1_piece0 stored as bytes in memory (estimated size 4.2 KB, free 265.2 MB) 15/05/05 06:31:18 in FO storage. Blockmanagerinfo:added Broadcast_1_piece0 in Memory on localhost:59627 (size:4.2 KB, free:265.4 MB) 15/05/05 06:31:18 in FO storage. blockmanagermaster:updated info of block broadcast_1_piece015/05/05 06:31:18 info Spark. Defaultexecutioncontext:created broadcast 1 from broadcast at DAGSCHEDULER.SCALA:83815/05/05 06:31:18 INFO Scheduler. Dagscheduler:submitting 2 missing tasks from Stage 0 (Mappedrdd[7] at map at <console>:18) 15/05/05 06:31:18 INFO SC Heduler. Taskschedulerimpl:adding task set 0.0 with 2 tasks15/05/05 06:31:18 INFO Scheduler. Tasksetmanager:starting task 0.0 in stage 0.0 (TID 0, localhost, any, 1293 bytes) 15/05/05 06:31:18 INFO Scheduler. Tasksetmanager:starting Task 1.0 in Stage 0.0 (TID 1, localhost, any, 1293 bytes) 15/05/05 06:31:18 INFO executor. Executor:running Task 1.0 in Stage 0.0 (TID 1) 15/05/05 06:31:18 INFO executor. Executor:running task 0.0 in stage 0.0 (TID 0) 15/05/05 06:31:18 INFO Rdd. Hadooprdd:input split:hdfs://master:8020/user/p.txt:15+1515/05/05 06:31:18 INFO rdd. Hadooprdd:input split:hdfs://master:8020/user/p.txt:0+1515/05/05 06:31:19 INFO executor. executor:finished task 0.0 in stage 0.0 (TID 0). 1755 bytes result sent to driver15/05/05 06:31:19 INFO executor. Executor:finished Task 1.0 in Stage 0.0 (TID 1). 1733 bytes result sent to driver15/05/05 06:31:19 INFO Scheduler. Tasksetmanager:finished Task 1.0 in Stage 0.0 (TID 1) in the MS on localhost () 15/05/05 06:31:19 INFO Scheduler. Dagscheduler:stage 0 (collect at <console>:18) finished in 0.782 s15/05/05 06:31:19 INFO Scheduler. tasksetmanager:finished task 0.0 in stage 0.0 (TID 0) in 772 MS on localhost (2/2) 15/05/05 06:31:19 INFO Scheduler. taskschedulerimpl:removed TaskSet 0.0, whose tasks has all completed, from pool 15/05/05 06:31:19 INFO Scheduler. Dagscheduler:job 0 finished:collect at <console>:Took 0.860763 SName:kangName:wuName:liuName:zhang

Spark SQL supported import JSON format, save for later test use, refer to Here

---------------------------------------------Ornate Split Line----------------------------------------------------------------------- ------------

Continue research and test spark SQL support for relational databases

The data source API provides pluggable mechanisms to access structured data through spark SQL. Data sources not only have an easy way to transform data and join the Spark platform.

Using data sources as simple as accessing them via SQL (or your favorite spark language)

CREATE Temporary TABLE episodesusing com.databricks.spark.avroOPTIONS (path "Episodes.avro")

Another advantage of the data source API is that users can manipulate the data in all languages supported by spark, regardless of the source of the data. For example, those data sources that are implemented in Scala can be used by pyspark users without the need for additional library developers to do any extra work. In addition, Spark SQL makes it easy to access data from different data sources using a single interface.

Spark SQL1.3

Test of Spark SQL1.2 and spark SQL1.3

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Test of Spark SQL1.2 and spark SQL1.3

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Test of Spark SQL1.2 and spark SQL1.3

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support