"Tianchi competition" shopping malls in the precise positioning of the user's shop thinking summary

Source: Internet
Author: User
Tags cos xgboost idf bssid

Game data and evaluation method solution 1 Data Partitioning 2 preprocessing 3 construction candidate 42 classification predictive feature WiFi feature distance feature user store feature algorithm model model fusion sentiment

1 Game Questions

The goal of this game is to accurately locate the user's current store within the mall. The information given includes the WiFi signal strength, GPS, base station location, and historical transactions to determine the store where the test set transactions took place.

Our team is I go, what (go out to the right, Dongfeng westerly reading House, Wakup, Guanshan), the final 15.

Details and data See competition official website 2 data and evaluation methods

Provides user desensitization of the 2017-07-01 ~ 2017-08-31 transaction details (including the WiFi signal strength, GPS, base station location) data, forecast the user 201 July-September 01-14 days of transactions occurred in the store.

Evaluation method: Accuracy = predict the correct number of shop/total samples. 3 Solutions

There are two main parts, one is to construct the candidate store collection, and then in the candidate focus to do two classification prediction. And if the construction candidate does not do well, the back of the prediction will have no meaning, so the construction candidate used the coverage of the indicators, on this basis, the final use of accuracy rate as the final indicator, in order to improve the pace of optimization. 3.1 Data Partitioning

Collection Sample Range feature Interval
Training set [2017-08-25, 2017-08-31] [2017-07-01, 2017-08-25]
Forecast set [2017-09-01, 2017-09-14] [2017-07-01, 2017-08-31]
3.2 preprocessingThere is a WiFi strength of NULL data, directly deleted, so as not to cause interference. For WiFi filtering that is less than 3 times the number of training sets, a certain amount of bssid can be reduced. 3.3 Construction Candidates

A number of construction candidate sets were used to assess their effectiveness through coverage, with coverage of 97% for the first season and 95% for the second season. Mainly include:

Connected Wi-Fi history shop. Test set connected to the WiFi record, remove Bssid, and the feature range connected to the WiFi record, find the same BSSID record count of the top N stores.

TF-IDF Select the first 3 samples.

TF−IDF=TF (Word frequency) ∗idf (inverse document rate) TF-IDF = TF (Word frequency) * IDF (Inverse document frequency)

In this project, the same record gets the sorted value based on the WiFi signal strength, and makes a weight=f (x) =exp ((0-i) * 0.6) mapping. For the feature interval, define the SHOP_TFIDF =shop-bssid group for Weight and/(shop group for Weight and * bssid Group weight and), for the sample range, for each store in this mall, Calculates the TFIDF value of all bssid for this sample (by join in 1) and sums it up as the TFIDF for this shop. Then take the TFIDF value to rank top N.

Sampling of the strongest signal:

The bssid of the strongest wifi in the feature interval store is counted, then the store counts before the strongest BSSID association in the sample interval, taking the top N.

The number of stores the user has visited in this store is the most frequently n.

According to the recorded latitude and longitude of the store transaction latitude and longitude of the nearest n. (using the latitude and longitude of the store transaction in the characteristic interval instead of its own latitude and longitude).

According to the record of latitude and longitude and the store itself latitude and longitude of the nearest n.

The Cos similarity of the WiFi signal is nearest n. For the WiFi signal information, as a high-dimensional vector, you can calculate the similarity of the WiFi signal based on the Cos similarity of the vectors.

Distance calculation using formulas

0.1 * * ((((lon1-lon2) * * 2 + (LAT1-LAT2) * * 2) * * 0.5 * 100000)

We n take 3 or 4 or so. 3.4 Two classification predictions

Building the candidate set through the previous step, this step mainly completes the question of whether this shop, that is, two classification. features WiFi features connect WiFi with the number of times the store is connected to WiFi. Store and the TFIDF value of this record (see Construction candidate) sample interval this records the strongest signal with the same count as the strongest signal in the store history trade. Sample interval This record WiFi signal strength with store history WiFi cosine similarity. Whether to connect to WiFi. The sample interval records the WiFi signal strength ranking and the store WiFi signal strength ranking, as two vectors, calculating the l1,l2 distance. Sample range WiFi has the same number of BSSID as the store's historical WiFi. History of the Wifi_count_sum wifi_count_sum/store WiFi counting distance from the feature sample interval records the latitude and longitude of the distance between the average latitude and longitude of the store transaction and the function map of the store transaction latitude and longitude. Distance from store latitude and longitude. Number of transactions for the user, store feature store. Number of shops traded/mall times = ratio in the mall. In minutes, the store's volume (the minimum of the average) in a small period of time is recorded and accounted for. The volume characteristics of weekends, stores on weekends and non-weekends. Users have been to this store number of times users have visited this store/user Count = In user ratio. The number of times the user has visited this store/the number of users this mall = in the user this market share. User's consumption in this price range. The average user price-this record price.

Some other features can refer to the code, not here. Algorithm Model

The preliminaries used the Xgboost and LIGHTGBM,LIGHTGBM effects better than Xgboost, the rematch using Xgboost and GBDT (XGBOOST>GBDT) and GBDT huge energy consumption, The latter is also better than the computational limit to abandon the blending fusion method. Model Fusion

Using the blending fusion method, the training set is divided into two parts, then the first part is used to train the base model and the probability value of the base model as the second part to train the second section and then predict the test set. Smiles are lifted, but the amount of computation is particularly consumed.

Later use multiple model probability value weighted fusion, small ascension. Impressions

The team-mates to force (go out to the right, Dongfeng westerly reading House, Wakup), mainly responsible for the line, I am responsible for online games, we did not use multi-classification to build features (mainly considering the lack of computing resources), is a big mistake, is said to raise 2 points around, a little regret, the final 15. The computational resources were also strained, resulting in many ideas not being fulfilled.

CSDN Original: http://blog.csdn.net/shine19930820/article/details/79130486

GitHub Code: Https://github.com/InsaneLife/Positioning-shops

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.