Hive Lab Competition

Source: Internet
Author: User

CREATE TABLE User (
UserID STRING,
Sex STRING,
Age INT,
Career INT,
Code STRING
)
ROW FORMAT Delimited
Fields TERMINATED by ': ';

LOAD DATA LOCAL inpath '/home/hadoop03/rating/users.dat ' overwrite into table user;

CREATE TABLE Rating (
UserID STRING,
MovieID STRING,
Rate INT,
Tmpe TIMESTAMP
)
ROW FORMAT Delimited
Fields TERMINATED by ': ';

LOAD DATA LOCAL inpath '/home/hadoop03/rating/ratings.dat ' overwrite into table rating;

CREATE TABLE Movie (
MovieID STRING,
Moviename STRING,
Movietype array<string>
)
ROW FORMAT Delimited
Fields TERMINATED by ' ^ '
COLLECTION ITEMS TERMINATED by ' | ';

LOAD DATA LOCAL inpath '/home/hadoop03/rating/movies.dat ' overwrite into table movie;

Every age the highest movie
Select Aa.age,aa.mymovietype,aa.c from (select Jun.age,jun.mymovietype,jun.c,row_number () over (distribute by age sort B Y c desc) rownum from (select Xiong.age,mymovietype,count (Mymovietype) as C from (select Bear.age,mymovietype from (sele CT Usr.age,mov.movietype from rating rat left join user usr on rat.userid = Usr.userid left join movie mov on Rat.movieid = Mov.movieid where rat.rate=5) bear lateral VIEW explode (bear.movietype) Movietype as Mymovietype) Xiong GROUP by XIONG.A Ge,xiong.mymovietype) June) AA where aa.rownum = 1;


Pmod (DateDiff (From_unixtime (Tmpe, ' yyyy-mm-dd '), ' 2001-01-01 '), 7)
Select Pmod (DateDiff (From_unixtime (Tmpe, ' yyyy-mm-dd '), ' 2014-09-21 '), 7) from rating limit 3;
Select WeekOfYear (From_unixtime (Tmpe, ' yyyy-mm-dd ')), Day (From_unixtime (Tmpe, ' Yyyy-mm-dd ')) from rating limit 3;

Every day of week every movie type highest movie
Select Case Xx.day If 0 then 7 else Xx.day end, Xx.mymovietype, Xx.moviename from (select Jun.day, Jun.mymovietype, June. Moviename,jun.cou, Row_number () over (distribute by Jun.day, jun.mymovietype Sort by jun.cou desc) rownum from (select is Ar.day,bear.mymovietype,bear.moviename,count (Bear.tmpe) as Cou from (SELECT * FROM (select Pmod (DateDiff (from_unixtime (Tmpe, ' yyyy-mm-dd '), ' 2014-09-21 '), 7) as Day,rat.tmpe,mov.movietype,mov.moviename from rating rat join movie mov on RAT.M Ovieid = Mov.movieid where rat.rate=5) Xiong lateral view explode (xiong.movietype) Movietype as Mymovietype) bear Grou P by bear.day,bear.mymovietype,bear.moviename) June) xx where xx.rownum = 1;


Select Movieid,moviename,mymovietype from movie lateral View explode (movietype) Movietype as Mymovietype limit 9;

SELECT DISTINCT (mymovietype) from (Select Day (From_unixtime (Tmpe, ' yyyy-mm-dd ')) as Day,rat.tmpe,mov.movietype, Mov.moviename from rating rat join movie mov on rat.movieid = Mov.movieid where rat.rate=5) Xiong lateral view explode ( Xiong.movietype) Movietype as Mymovietype

=============================================================================================================== ================================================================


CREATE TABLE WiFi (
Phone STRING,
Year STRING,
Month STRING,
Day STRING,
Hour STRING,
Minute STRING,
Second STRING,
TimeZone STRING,
Host STRING,
Facility STRING,
Service STRING,
Mac STRING,
Protocol STRING,
Message STRING
)
ROW FORMAT Delimited
Fields TERMINATED by ', ';

Load data local inpath '/home/hadoop03/rating/wifi_data.txt ' overwrite into table wifi;

CREATE TABLE Wifi2 (
TT BIGINT,
Mac STRING,
Message STRING
)
ROW FORMAT Delimited
Fields TERMINATED by ', ';

Insert Overwrite table Wifi2 Select Unix_timestamp (concat (year, '-', month, '-', Day, ', Hour, ': ', Minute, ': ', second)), Mac,message from WiFi;

1 Select COUNT (Distinct (MAC)) from WiFi;

2 from (SELECT A.mac, A.tt, MIN (B.TT-A.TT) as TT from Wifi2 A, wifi2 B WHERE a.mac = B.mac and A.tt <= b.tt and A. Message like '%association ok% ' and b.message like '%deauthenticated% ' GROUP by A.mac, a.tt) e select count (E.TT);

3 from (SELECT A.mac, A.tt, MIN (B.TT-A.TT) as TT from Wifi2 A, wifi2 B WHERE a.mac = B.mac and A.tt <= b.tt and A. Message like '%association ok% ' and b.message like '%deauthenticated% ' GROUP by A.mac, A.TT) e select AVG (E.TT);

4 not sure.





The competition includes 2 parts, you has to work with your group members to finish 2 tests.

Hive 1:

We have a DVD shop providing the DVD rent service. Based on different age group or flavor, the DVD shop need does some recommendations to the customer according to below movie Rating.

Here are the rate file contain 1,000,209 anonymous ratings of approximately 3,900 movies made by 6,040 users.

For the format file, check the readme included in the ZIP package.

Here is the demand:

1. Rate the highest film type according to different age groups

2. count the movie names of the different movie types with the highest recommended per week

Hive 2:

Background:

The retail customer wanted us to does some, driven by an idea:it must is possible to bring the concepts of Trackin G Users in the online world to retail stores. One of the most important key performance indicators is revenue per square metre. We thought about bringing in some new metrics. From a wider perspective, data are produced by various sensors. With a real store in mind we figured out possible sensors stores could use–customer frequency counters at the doors, the Cashier system, free Wi-Fi access points, video capturing, temperature, background music, smells and many more. While for many of those sensors additional hardware and software was needed, for a few sensors solutions are around, e.g. v Ideo capturing with the face or even eye recognition. We talked about our ideas with executives and consultants from the retail industry and they confirmed We are interest ing to persue.

Solution:

We thought the most interesting sensor data (so doesn ' t require additional hardware/software) could be the WiFi access P Oints. Especially given that many visitors would have WiFi enabled mobile phones. With it ' s log files we should is able to answer at least the following questions for a particular store:

    • How many people visited the store (unique visits)?

    • How many visits do we have the in total?

    • What's the average visit duration?

    • How many people is new vs. returning?

Here is the raw Data sample:

1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,mlme,mlme-authenticate.indication ( 98:0c:82:dc:8b:15, Open_system)

1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,mlme,mlme-deletekeys.request (98:0c : 82:dc:8b:15)

1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,ieee 802.11,authenticated

1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,ieee 802.11,association OK (aid 2)

1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,ieee 802.11,associated (aid 2)

1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,mlme,mlme-associate.indication ( 98:0C:82:DC:8B:15)

1358756939,2013,1,21,9,28,59,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,mlme,mlme-deletekeys.request (98:0c : 82:dc:8b:15)

1358757010,2013,1,21,9,30,10,+01:00,buffalo,hostapd,wlan0,98:0c:82:dc:8b:15,ieee 802.11,deauthenticated

The description of the column is as follows:

iso_8601 year Month day hour minute second timezone host Facility_level service_level mac_address protocol Message


We are interested in "Authenticationok ' and" deauthenticated "messages only. The messages from Therouter is not standardized (as is protocols such as TCP). We Foundthat Those the messages is the closest ones to ourunderstanding of a "login"/"logout" on the router.

NOWWE has the data we need to answer the following questions:

  • Howmany people visited the store (unique visitors)?
    Note:unlike thetraditional Customer Frequency counter at the doors we had macaddresses at the log files that is unique For mobile phones. Supposed people don't change their mobile phones we can recognizeunique visitors and not just visits.

  • Howmany visits did we have?

  • Whatis the average visit duration?

  • Howmany people is new vs. returning?


Conclusion:
















Hive Lab Competition

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.