"Gandalf." Recommend system data completion using hive SQL implementation

Last Update:2014-09-29 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

DemandIn the recommended system scenario, if the underlying behavior data is too small, or too sparse, the recommended algorithm is likely to not reach the required number. For example, if you want to recommend 20 item for each item or user, but only 8 by calculation, the remaining 12 will need to be complete. Welcome reprint, please specify Source: http://blog.csdn.net/u010967382/article/details/39674047
StrategyThe specific strategies for data completion are:

completion time: After the mining calculation is finished, the mining results are imported into HBase (The final web system takes data from HBase) before, data completion, completion of data re-import hbase. (There is also an optional time, after receiving the request to achieve completion in the program, but such efficiency is certainly not directly from the HBase reading high, so space-time is a more reasonable strategy);
Implementation technology: The completion process is based on the hive implementation;
Complete data: The test process using the current browse item under the classification of the near period of time to browse the amount of topn;
Test Scenario: This article is only for "look and see" for data completion experiments , other recommended requirements are similar.

Experimental Process 1. First Debug SQL under OracleThe commissioning process involves two sheets: (1) TEST_TOPN:
Each row in the table represents the amount of time an item is visited on a given day.

(2)test_x_and_x:
Each row in the table represents the viewed item and its traffic for each item.Our goal is to complete the table, for each current_item to have 5 look and look at the item. For example, for item 10,001th, you need to get TOP2 from the IT classification to fill the table.
The following SQL has been successfully implemented in Oracle for this purpose:Select * from (select row_number () over (partition&NBSP; by Current_item_category , current_item_id order by source , view_count desc ) no , current_item_id , Current_item_category, andx_item_id, source, View_count from(Select current_item_id, current_item_category, andx_item_id, 1 source, view_count From test_x_and_xUnion Select a.current_item_id, A.current_item_category,b.item_id,2,B.view _count from (select current_item_id,current_item_category from test_x_and_x Group by current_item_id,current_item_category) a, test_topn b where a.current_item_category = b.item_category)) where no<=5
Note: Where the source column is used to identify whether the data is from the original table or TOPN, all TOPN table data is queued after the original table data. 2. Porting SQL statements from Oracle to HiveSuccessful migration of Hive SQL:SELECT * from(select Rank () over (partition by c.current_item_category,c.current_item_id ORDER BY c.source,c.view_count Desc) No, c.current_item_id, C.current_item_category, c.andx_item_id, C.source, C.view_count from(select current_item_id,current_item_category,andx_item_id,1 source,view_countFrom test_x_and_xUnion AllSelect a.current_item_id current_item_id,a.current_item_category current_item_category,b.item_id andx_item_id, 2 Source,b.view_count View_count from(select Current_item_id,current_item_category from test_x_and_xGROUP by Current_item_id,current_item_category) A, test_topn bwhere a.current_item_category = b.item_category) C) d where d.no <= 5;
The execution results are exactly the same as in Oracle:

Some pits are encountered during the transplant and are hereby recorded:

Hive supports only union all and does not support union;
UNION ALL of the two tables, not only the corresponding field data type, field names (can use column aliases) must also be identical;
The result set for each nested subquery must use a table alias!

"Gandalf." Recommend system data completion using hive SQL implementation

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

"Gandalf." Recommend system data completion using hive SQL implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

"Gandalf." Recommend system data completion using hive SQL implementation

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support