Spark Learning five: Spark SQL

Last Update:2016-05-09 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

tags (space delimited): Spark

Spark learns five spark SQL
- An overview
- Development history of the two spark
- Three spark SQL and hive comparison
- Quad Spark SQL schema
- Five SPRK SQL access to hive data
- Six catalyst
- Seven Thriftserver
- Eight Dataframe
- Nine load external data sources
- Spark SQL Power was born

One, overview:

Second, the history of Spark's development

Third, Spark SQL vs. hive comparison

IV, SPARK SQL schema

Five, SPRK SQL Access hive data

hive-site.xml需要拷贝到spark的conf目录下面

Starting mode one:

//launch app  Bin/spark-shell  -- driver-class  -path  jars/mysql-connector  -java  - 5.1  .27  -bin   Jar -- master local  [ 2  ]

sqlContext.sql("show databases").show()

sqlContext.sql("use default").show()sqlContext.sql("show tables").show()

Starting mode two:

//启动应用bin/spark-sql--driver-class-path jars/mysql-connector-java-5.1.27-bin.--local[2]

show databases;

//缓存cache table emp;//取消缓存uncache table emp;

VI, Catalyst

Seven, Thriftserver

Start the service

Sbin/start-Thriftserver.SH --Master Local[2] --Driver-class-Path Jars/mysql-connector-Java-5.1. --bin.Jar

Start the Beeline client

bin/beelinebeeline> !connect jdbc:hive2://localhost:10000

BA, Dataframe

Nine. Loading external data sources

1. Loading JSON data

val json_df=sqlContext.jsonFile("hdfs://study.com.cn:8020/spark/people.json")json_df.show()

2. Load Hive Data

sqlContext.table("default").show()

3, loading Parquet format data

val parquet_df=sqlContext.jsonFile("hdfs://study.com.cn:8020/spark/users.parquet")parquet_df.show()

4,jdbc Way to get data

= sqlContext.jdbc("jdbc:mysql://localhost:3306/db_0306?user=root&password=123456""my_user"= sqlContext.load("jdbc"Map("url"-> "jdbc:mysql://localhost:3306/db_0306?user=root&password=123456","dbtable"-> "my_user"))

5, read the text file
The first way:

case class Person(name:String,age:Int)val people_rdd = sc.textFile("spark/sql/people.txt")val rowRdd = people_rdd.map(x => x.split(",")).map(x => Person(x(0), x(1).trim.toInt))val people_df=rowRdd.toDF()

The second way:

Val People_rdd = SC. Textfile("Spark/sql/people.txt") Import org. Apache. Spark. SQL. _val Rowrdd = People_rdd. Map(x=x. Split(",")). Map(x= = Row (x(0),x(1). Trim. ToInt)) Import org. Apache. Spark. SQL. Types. _val schema = Structtype (Array (Structfield ("Name", StringType, True), Structfield ("Age", Integertype, False))) Val rdd2df = SqlContext. Createdataframe(Rowrdd, Schema)

Test:

Spark SQL Power was born,

Hive Table
Emp
MySQL Table
Dept

Join for the above two tables,

val hive_emp_df = sqlContext.table("db_0228.emp")val mysql_dept_df = sqlContext.jdbc("jdbc:mysql://localhost:3306/db_0306?user=root&password=123456""tb_dept")val join_df = hive_emp_df.join(mysql_dept_df, hive_emp_df("deptno") === mysql_dept_df("deptno"))join_df.show

Case analysis

Sqlloganalyzer.scala

 PackageCom.ibeifeng.bigdata.spark.appImportOrg.apache.spark.sql.SQLContextImportOrg.apache.spark. {sparkconf, Sparkcontext}/** * Created by Xuanyu on 2016/4/17. * * Object sqlloganalyzer {  defMain (args:array[string]) {//Create sparkconf instance    Valsparkconf =NewSparkconf (). Setappname ("Sqlloganalyzer"). Setmaster ("local[2]")//Create Sparkcontext instance    Valsc =NewSparkcontext (sparkconf)//Create SqlContext instance    ValSqlContext =NewSqlContext (SC)ImportSqlcontext.implicits._// ==============================================================    //Input Files    ValLogFile ="Hdfs://bigdata-senior01.ibeifeng.com:8020/user/beifeng/apache.access.log" //    //create Rdd    ValACCESSLOGS_DF = Sc.textfile (logFile)/** * Filter log datas */. Filter (Apacheaccesslog.isvalidatelogline)/** * Parse log * /. map (log = apacheaccesslog.parselogline (log)). TODF () accesslogs_df.registertemptable ("Accesslogs")//CacheAccesslogs_df.cache ()// =======================================================================================    //Compute    ValAvgcontentsize = Sqlcontext.sql ("SELECT AVG (contentsize) from Accesslogs"). First (). Get (0)ValMincontentsize = Sqlcontext.sql ("Select min (contentsize) from Accesslogs"). First (). Get (0)ValMaxcontentsize = Sqlcontext.sql ("select Max (contentsize) from Accesslogs"). First (). Get (0)//printlnprintln"Content Size Avg:%s, Min:%s, Max:%s". Format (avgcontentsize, Mincontentsize, maxcontentsize))//Accesslogs_df.unpersist ()ValAVG_DF = Accesslogs_df.agg ("Contentsize"-"AVG")ValMIN_DF = Accesslogs_df.agg ("Contentsize"-"Min")ValMAX_DF = Accesslogs_df.agg ("Contentsize"-"Max")//printlnprintln"= = = Content Size Avg:%s, Min:%s, Max:%s". Format (Avg_df.first (). Get (0), Min_df.first (). Get (0), Max_df.first (). Get (0)    ))// ==============================================================    //Stop SparkcontextSc.stop ()}}

Spark Learning five: Spark SQL

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Spark Learning five: Spark SQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Spark Learning five: Spark SQL

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support