Sparksql---practical application

Last Update:2017-07-02 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Sparksql---practical application

Data Set : Http://grouplens.org/datasets/movielens/MovieLens 1M datase

Related Data files:

Users.dat---Userid::gender::age::occupation::zip-code

Movies.dat---movieid::title::genres

Ratings.dat---Userid::movieid::rating::timestamp

Sogouq.mini

Complete the following business requirements:

1. Young men of the age group in "18-24", who would like to see the top 10

2. The top 10 films with the highest score, the top 10 people who saw the most movies, the 10 films that women see the most, and the 10 films most men have seen.

3. Use the data set SogouQ2012.mini.tar.gz to sort the data by the number of visits to the top 10 sites

The code is as follows:

Import Org.apache.spark. {sparkconf, sparkcontext}import org.apache.spark.sql.SQLContextimport org.apache.spark.sql.Datasetobject hw_ Sparksql {case Class User (uid:string, xb:string,age:int,v4:string,v5:string) Case class Movie (mid:string,name:str ing,t:string) Case Class Rating (uid:string, mid:string,v3:double,v4:string) Case Class Brower (V1:string, V2:stri ng,v3:string,v4:string,v5:string,v6:string) def main (args:array[string]): Unit = {val conf = new sparkconf (). Setap PName ("Readjson"). Setmaster ("local"). Set ("Spark.executor.memory", "50g"). Set ("Spark.driver.maxResultSize", "50g" Val sc = new Sparkcontext (conf) val sqlcontext = new SqlContext (SC)
　　Implicit conversion import sqlcontext.implicits._ val UserInfo = sc.textfile ("c:\\users\\bigdata\\desktop\\ file \\BigData\\Spark\ \3.sparkcore_2\\data\\data\\users.dat "). Map (_.split (":: ")). Map (P = = User (P (0), p (1), P (2). Trim (). Toint,p (3), p (4 )). TODF () userinfo.registertemptable ("User") val movieinfo = Sc.textfile ("c:\\users\\bigdata\\desktop\\ file \\BigData    \\Spark\\3.SparkCore_2\\data\\data\\movies.dat "). Map (_.split (":: ")). Map (p = Movie (P (0), p (1), P (2)). TODF () Movieinfo.registertemptable ("Movie") val ratingsinfo = Sc.textfile ("c:\\users\\bigdata\\desktop\\ file \\BigData\\ Spark\\3.sparkcore_2\\data\\data\\ratings.dat "). Map (_.split (":: ")). Map (P = = Rating (P (0), p (1), P (2). Todouble,p ( 3)). TODF () ratingsinfo.registertemptable ("Rating") val browerinfo = Sc.textfile ("c:\\users\\bigdata\\desktop\\ file \ \ Bigdata\\spark\\3.sparkcore_2\\sogouq2012.mini\\sogouq.mini "). Map (_.split (" \ T ")). Map (P =>brower (p. 0), P (1), p (   2), P (3), P (4), P (5)). TODF () browerinfo.registertemptable ("Brower")     The age group of young men in "18-24", most like to see which of the 10 Val top10_m_18_24 = Sqlcontext.sql ("Select X.N as Name,count (*) as Count from (Selec T distinct Rating.mid as M, rating.uid as U, movie.name as n from Rating,user,movie WHERE user.age>=18 and User.age&lt ; =24 and User.xb=\ "m\" and User.uid=rating.uid and Movie.mid=rating.mid) as x GROUP by X.N ORDER by Count Desc ") Top1 0_m_18_24.show (10)//The first 10 people to see the most movies Val top10_pepole= sqlcontext.sql ("Select Uid,count (UID) as Count from       Rating GROUP BY UID ORDER by Count Desc ");             Top10_pepole.show (10);  Highest scoring 10 movies Val top10m_score=sqlcontext.sql ("Select Mid, (SUM (V3)/count (V3)) as AV from Rating Group by mid order by  AV desc ") top10m_score.show (10)//women watch up to 10 movies val Top10_female = Sqlcontext.sql (" Select X.n,count (*) as C From (select distinct Rating.mid as M, rating.uid as U, movie.name as n from Rating,user,movie WHERE user.xb=\ "f\" and User.uid=rating.uid and Movie.mid=rating.mid) as x GROUP by X.n ORDER BY C Desc ") Top10_female.show (10)//Men see the most 10 movies val Top10_male = Sqlcontext.sql (" Select X.n,co UNT (*) as C from (select distinct Rating.mid as M, rating.uid as U, movie.name as n from Rating,user,movie WHERE USER.XB        =\ "m\" and User.uid=rating.uid and Movie.mid=rating.mid) as x GROUP by X.N ORDER by C Desc ") Top10_male.show (10) Visit the top 10 site val top10_brower = Sqlcontext.sql ("Select V6 as Name,count (*) as Count from Brower Group by V6 ORDER by Count Desc ") Top10_brower.show (10)}}

Sparksql---practical application

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Sparksql---practical application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Sparksql---practical application

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support