News Personalization recommendation System (Python)-(with source data set)

Last Update:2014-09-25 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

1. Background
recently participated in a review, is about personalized news recommendations. The plain is to give you a person's browsing record, predicting his next browsing record. Spent a week writing an integrated system, can be a key to recommend news, but the accuracy rate is not ideal, so sent here to hope that you give some advice. The code borrowed from the participle section of the Jieba participle. The data set and code are given below.
2. Data set

A total of five fields, separated by tab. The user number, the news number, the time number, the headline, the day of the current month (3 is number 3rd)
3. Code section
Take a look at the demo diagram First

(1) algorithm description For example, a simple explanation of the algorithm, in fact, is relatively simple, inappropriate place to hope that everyone correct. We have one of the following data

57389361006498791394550848MH370 Flight FALSE Passport passenger identification (update) 11

5738936 the user at number 11th saw the "MH370 flight fake passport passenger ..." This piece of news. We found the hot word of number 11th by Jieba as follows.

The 3,113 anniversary of the loss of the United passports of the passport holders of the invisible passport of Kuala Lumpur

we found that the two keywords of "flight" and "passport" appeared in the news. So we recommend5738936 The user, number 11th appears "Flight", "passport" other news. At the same time we have dealt with the recommendation set, for example, 5738936 of the news will not appear, very low-heat news will not appear.

(2) How to use the whole system uses one-button start-up, which is very convenient to use. First set up a test folder, and then create a new three folder in test, notice the name to be unified with the diagram, because the news is time-lapse, every day to separate to calculate, to store every day of content into documents. Test documents, such as, can be generated automatically. (The github link below provides the complete test document structure)

when using, first set the path parameter of the test folder in global_param.py. all set up, just find the Wordsplite_test package below the main () function, run the program.
Global_param Setting parameter description:Number_jieba: Controlling the number of extracted keywords
Number_day: The number of days to predict from the first dayhot_rate: Forecast Set forecast News heat, the greater the value of the higher the heat
(3) Code flow
First we look at the main ().

Import get_day_dataimport get_keywordsimport get_keynewsimport delete_repeatimport get_hot_resultimport Global_ ParamDef Main (): For    I in range (1,global_param.number_day):        get_day_data. Transfordata (i)        get_day_data. Transfordataset (i)        get_keywords. Get_keywords (i)        get_keynews. Get_keynews (i)    delete_repeat.delete_repeat ()    Get_hot_result.get_hot_result (global_param.hot_rate) Main ( )

1. First Get_day_data. Transfordata (i) function, find the last time I browsed the news of the user behavior, stored in the Test/train_lastday_set directory.
2.get_day_data. Transfordataset (i) function, distinguish every day of news, stored in the test/train_date_set1 directory
3.get_keywords. Get_keywords (i) function, call Jieba Library, pick out the hottest keywords of every day, store under Test/key_words
4.get_keynews. Get_keynews (i) function, through the last time each user browsed news, compared to see if there is a hot keywords on the day. If present, it is recommended that the same day include this keywords other news. Cycle global_param.number_day days, generate Test/result.txt files
5. Delete_repeat.delete_repeat () function, remove duplicates from result, generate test/result_no_repeat.txt
6.Get_hot_result.get_hot_result (global_param.hot_rate) function, because the Result_no_repeat function generated above may appear, each user recommends too many cases, affecting the accuracy rate. So using this function to control the quantity, each user only recommended candidates with relatively high news popularity. Final result set Test/result_no_repeat_hot.txt

Note: The Result.txt file under test is manually emptied every time the program is executed, and other files are automatically generated without processing. Project Address: Https://github.com/X-Brain/News-Recommend-System (src folder is code, test is data, and document structure)Hope that you have any suggestions, can be in the blog message, or on GitHub issue, hope that more people participate in the contribution.

/********************************

* This article from the blog "Bo Li Garvin"

* Reprint Please indicate the source : Http://blog.csdn.net/buptgshengod

******************************************/

News Personalization recommendation System (Python)-(with source data set)

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

News Personalization recommendation System (Python)-(with source data set)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

News Personalization recommendation System (Python)-(with source data set)

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support