How Data Mining solves problems
This section describes how to solve business problems through data mining through several actual data mining cases. The story about "beer and diapers" in Section 2.1.1 is the most classic case in data mining. Target's use of the pregnancy prediction index to predict whether a female is pregnant is also a hot topic recently cited by data mining scholars.
Many may ask, what exactly CAN Data Mining do for enterprises? Next we will explain this problem through the most classic case in Data Mining-a story about diapers and beer.
Diapers and beer
Wal Mart is one of the world's largest data warehouse systems. In order to be able to accurately understand the purchasing habits of customers in their stores, Walmart analyzed the shopping Shopping Basket Association Rules of its customers, so as to know the items that customers often buy together. The detailed raw transaction data of all its stores is collected in Walmart's huge data warehouse. Based on the raw transaction data, Walmart uses data mining tools to analyze and mine the data. An amazing and unexpected result is: "The most purchased item with diapers is beer "! This is the result of data mining technology's analysis of historical data, reflecting the internal laws of data. Is this result in line with the actual situation? Is it a useful knowledge? Is it useful?
To verify the results, Walmart dispatched market investigators and analysts to investigate and analyze the results. After a lot of practical research and analysis, they revealed a behavior model for American consumers hidden behind "diapers and beer": In the United States, going to the supermarket to buy baby diapers is the daily work of some young fathers after work, and 30% of them are ~ 40% of people also buy beer for themselves. The reason for this phenomenon is that American wives often tell their husbands not to forget to buy diapers for their children after work, the husbands bought diapers and brought back their favorite beer. In another case, the husband suddenly remembered their responsibilities when buying beer, and then bought diapers. Since there are many opportunities to buy diapers and beer together, Walmart places diapers and beer together in all of their stores, as a result, both wet diapers and beer sales increased.
According to the conventional thinking, wet diapers are incompatible with beer, and if we do not use data mining technology to mine and analyze a large amount of transaction data, it is impossible for Walmart to find this valuable rule in the data.
Target and pregnancy Prediction Index
With regard to the application of data mining, such a real case has recently been widely used in data mining and marketing mining.
A man in the United States broke into Target store, an American retail chain near his home, to protest: "you actually gave my 17-year-old daughter a coupon for baby diapers and baby carriages." The store manager immediately admitted the error to the recipient, but the manager did not know that this line was the result of running data mining by the company. 2-1. A month later, the father apologized because he knew that his daughter was indeed pregnant. Target was a month earlier than the father knew his daughter was pregnant.
Figure 2-1 target pregnancy Prediction Index
Target can "Guess" pregnant women by analyzing purchase records of female customers. They mined 25 items from target's data warehouse that are highly correlated with pregnancy and produced the pregnancy prediction index. For example, they discovered that women would be pregnant for about four months and bought a large number of fragrance-free latencies. Based on this, after calculating the expected date of birth, we will first send discount coupons such as maternity dress and crib to customers to attract customers to purchase.
If data mining is not implemented on the basis of massive user transaction data, target cannot achieve such accurate marketing. We will analyze target's precision marketing case in Chapter 1.
E-commerce website traffic analysis
Website traffic analysis refers to the statistics and analysis of the relevant data when basic website traffic data is obtained. Its common means is Web mining. Web Mining helps us understand the user access mode on the Web by analyzing traffic. So what are the benefits of the user access mode?
In terms of technical architecture, we can reasonably modify the website structure and appropriately allocate resources to build backend server groups, such as helping improve the network topology design and performance, fast and effective access paths are arranged between highly correlated nodes.
It helps enterprises better design websites and arrange webpage content.
Help enterprises improve marketing decisions, such as placing advertisements on appropriate web pages.
Help enterprises better arrange content based on customers' interests.
Helps enterprises segment customer groups and customize promotion strategies for different customers.
When people access a website, they provide personal feedback on the content of the website: which link is clicked, and which webpage has the most time to stay, which search item is used, the overall browsing time, and so on. All such information is stored in website logs. From the saved information, although the website has a large number of website visitors and their access content information, it is not equal to making full use of the information.
What if we convert the data to a data warehouse? These data with a large amount of information, with the help of the Data Warehouse Reporting System (usually called the Online Analysis and Processing System), can provide directly observed and relatively simple and direct information, however, the Website Cannot be notified of its information mode and how to process it, and it generally cannot analyze complex information. Therefore, for these relatively complex information or less intuitive problems, we can only solve them through data mining technology, that is, using machine learning algorithms to find the implicit mode in the database, report results or follow the results.
To enable e-commerce websites to fully apply data mining technology, we need to collect more comprehensive data. The more comprehensive the collected data, the more accurate the analysis. In practice, the following data can be collected:
Guest System attribute features. For example, the operating system, browser, domain name, and access speed used.
Access features. Including the stop time and click URL.
Terms and features. Including the network content information type, content classification, and visit URL.
Product features. Including the product number, product catalog, product color, product price, product profit, product quantity, and special price.
When a visitor visits the website, the above data about the visitor will gradually be accumulated, then we can use the accumulated data to sort out information related to this visitor for the website. The information that can be organized can be roughly divided into the following aspects:
Visitor's purchase history and AD click history.
The historical information of the hyperlink clicked by the visitor.
The total link opportunity of the visitor (the hyperlink provided to the visitor ).
Total visitor access time.
All webpages browsed by visitors.
The output profit of each visitor session.
The number of visits each month and the last visit time.
Visitor's positive or negative comments on the overall trademark.
This article is excerpted from new Internet: Big Data Mining
Tan lei
Published by Electronic Industry Publishing House