Sparksql---practical application

Source: Internet
Author: User

Sparksql---practical application

Data Set : Http://grouplens.org/datasets/movielens/MovieLens 1M datase

Related Data files:

Users.dat---Userid::gender::age::occupation::zip-code

Movies.dat---movieid::title::genres

Ratings.dat---Userid::movieid::rating::timestamp

Sogouq.mini

Complete the following business requirements:

1. Young men of the age group in "18-24", who would like to see the top 10

2. The top 10 films with the highest score, the top 10 people who saw the most movies, the 10 films that women see the most, and the 10 films most men have seen.

3. Use the data set SogouQ2012.mini.tar.gz to sort the data by the number of visits to the top 10 sites

The code is as follows:

Import Org.apache.spark. {sparkconf, sparkcontext}import org.apache.spark.sql.SQLContextimport org.apache.spark.sql.Datasetobject hw_ Sparksql {case Class User (uid:string, xb:string,age:int,v4:string,v5:string) Case class Movie (mid:string,name:str ing,t:string) Case Class Rating (uid:string, mid:string,v3:double,v4:string) Case Class Brower (V1:string, V2:stri ng,v3:string,v4:string,v5:string,v6:string) def main (args:array[string]): Unit = {val conf = new sparkconf (). Setap PName ("Readjson"). Setmaster ("local"). Set ("Spark.executor.memory", "50g"). Set ("Spark.driver.maxResultSize", "50g" Val sc = new Sparkcontext (conf) val sqlcontext = new SqlContext (SC)
  Implicit conversion import sqlcontext.implicits._ val UserInfo = sc.textfile ("c:\\users\\bigdata\\desktop\\ file \\BigData\\Spark\ \3.sparkcore_2\\data\\data\\users.dat "). Map (_.split (":: ")). Map (P = = User (P (0), p (1), P (2). Trim (). Toint,p (3), p (4 )). TODF () userinfo.registertemptable ("User") val movieinfo = Sc.textfile ("c:\\users\\bigdata\\desktop\\ file \\BigData \\Spark\\3.SparkCore_2\\data\\data\\movies.dat "). Map (_.split (":: ")). Map (p = Movie (P (0), p (1), P (2)). TODF () Movieinfo.registertemptable ("Movie") val ratingsinfo = Sc.textfile ("c:\\users\\bigdata\\desktop\\ file \\BigData\\ Spark\\3.sparkcore_2\\data\\data\\ratings.dat "). Map (_.split (":: ")). Map (P = = Rating (P (0), p (1), P (2). Todouble,p ( 3)). TODF () ratingsinfo.registertemptable ("Rating") val browerinfo = Sc.textfile ("c:\\users\\bigdata\\desktop\\ file \ \ Bigdata\\spark\\3.sparkcore_2\\sogouq2012.mini\\sogouq.mini "). Map (_.split (" \ T ")). Map (P =>brower (p. 0), P (1), p ( 2), P (3), P (4), P (5)). TODF () browerinfo.registertemptable ("Brower") The age group of young men in "18-24", most like to see which of the 10 Val top10_m_18_24 = Sqlcontext.sql ("Select X.N as Name,count (*) as Count from (Selec T distinct Rating.mid as M, rating.uid as U, movie.name as n from Rating,user,movie WHERE user.age>=18 and User.age&lt ; =24 and User.xb=\ "m\" and User.uid=rating.uid and Movie.mid=rating.mid) as x GROUP by X.N ORDER by Count Desc ") Top1 0_m_18_24.show (10)//The first 10 people to see the most movies Val top10_pepole= sqlcontext.sql ("Select Uid,count (UID) as Count from Rating GROUP BY UID ORDER by Count Desc "); Top10_pepole.show (10); Highest scoring 10 movies Val top10m_score=sqlcontext.sql ("Select Mid, (SUM (V3)/count (V3)) as AV from Rating Group by mid order by AV desc ") top10m_score.show (10)//women watch up to 10 movies val Top10_female = Sqlcontext.sql (" Select X.n,count (*) as C From (select distinct Rating.mid as M, rating.uid as U, movie.name as n from Rating,user,movie WHERE user.xb=\ "f\" and User.uid=rating.uid and Movie.mid=rating.mid) as x GROUP by X.n ORDER BY C Desc ") Top10_female.show (10)//Men see the most 10 movies val Top10_male = Sqlcontext.sql (" Select X.n,co UNT (*) as C from (select distinct Rating.mid as M, rating.uid as U, movie.name as n from Rating,user,movie WHERE USER.XB =\ "m\" and User.uid=rating.uid and Movie.mid=rating.mid) as x GROUP by X.N ORDER by C Desc ") Top10_male.show (10) Visit the top 10 site val top10_brower = Sqlcontext.sql ("Select V6 as Name,count (*) as Count from Brower Group by V6 ORDER by Count Desc ") Top10_brower.show (10)}}

  

Sparksql---practical application

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.