2015 Ali Tianchi big data game algorithm design

Source: Internet
Author: User

Project Address: Https://github.com/Huangtuzhi/AlibabaRecommand

Alibabarecommand

Alibaba Mobile recommending algorithm competition.

Competition Introduction

The contest analyzes the user's behavior data for one months on the mobile terminal and makes a recommendation for the following day's user purchase behavior.

Directory structure
├──license #许可证 └──readme.md #使用说明 # ├──create_table.sql #创建基本表 ├──add_table.sql #后续增加的表 ├──add_index.sql #为表建立索引 ├──add_table_31day.sql #建立存储31天数据的表, Structure ditto └──add_index_31day.sql #为表建立索引 # data import ├ ──datatodb.sql #大赛csv格式原始数据导入基本表 └──featuretodb.sql #feature. txt to import the corresponding table #main├──__init__.py├──trainmodel.py  ├──obtainpredict.py└──getfeature31day.py# data ├──feature.txt #符合某个标准的记录 (user_id,item_id,look,store,cart,buy) ├── Data_features.txt #feature The N-dimensional feature ├──data_features.npy #转为矩阵格式 (NumPy Library) recorded in. txt, #feature with ├──data_labels.txt. The label recorded in txt (1/0 = purchased/not purchased) ├──data_labels.npy├──feature_pos.txt #feature. txt all positive cases ├──feature_p.npy├──feat Ure_neg.txt #feature All negative examples ├──feature_p.npy├──trainset.npy #训练集 ├──testset.npy #测试集 └──31day_ in. txt Data_features.txt #31天所有数据的n维特征 # results ├──predict_all_pairs.txt #得到所有预测的userid itemid to └──filter_pairs.txt #用train_item过 Filter the UserID itemid to
Principle

The topic gave 31 days of data, and we chose the 30th day as the dividing point. Extract the n-dimensional features from the first 30 days of data (each [user_id,item_id] pair can fetch a single line of features) and mark each line with the real data of day 31st.

For example: A [user_id,item_id] pair [9909811,266982489] appears in the first 30 days, if on the 31st day it also appears and Behavior_type for purchase, the label for this line is 1, otherwise 0.

This formed a lot of characteristics of the data, we put the data in the logistic regression training, get a two classification model, so the model is trained.

The next thing to predict is the label above, which is the output of the model. A label of 1 means we think the user will buy it. So what is the input to the model? The input to the model is the characteristic of all data for 31 days.

1th~30th————> 31th的label1th~31th————> 32th的label

Since the 31th label data is known, it can be used to evaluate the trained model. The 32th label is the result of the output.

Description

This is a predictive framework, and the feature engineering needs to be further improved.

2015 Ali Tianchi big data game algorithm design

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.