Thinking in BigDate (10) Big Data-Data Mining Technology (1)

Source: Internet
Author: User

When big data talks about this, there are a lot of nonsense and useful words. This is far from the implementation of this step. In our previous blog or previous blog, we talked about our position to transfer data from traditional data mining to the Data Platform for processing, saving time and resources. But the problem is, where should we start if we don't have such big data or we have such big data. This is what we will discuss in the following blogs. It is also the core of big data: Data Mining. From the beginning to the end, we cannot break away from data mining. In fact, we have been engaged in data mining since the university, but we don't care about data mining. We care about how we find what we need through data mining, what are our concerns about this process? How to start?

It is necessary to organize concepts for beginners. If it's Daniel, skip these blogs.The summary process is also a learning process. You can sort out the learning content by chapter. In this process, we will talk about Data Mining from the implementation of specific projects. Many concepts, algorithms, business transformations, processes, modeling, and so on will be involved.

Let's list the topics to be discussed:

1. What is Data Mining and why is data mining required?

2. What are the applications of data mining in marketing and CRM?

3. Data Mining Process

4. Statistics you should understand

5. Data Description and prediction: Analysis and Prediction Modeling

6. Classic Data Mining Technology

7. Various Algorithms

8. Data Warehouse, OLAP, analysis sandbox and Data Mining

9. Case studies


What is data mining? Knowledge Discovery, business intelligence, predictive analysis, or predictive modeling. In fact, they can all be classified into one category:Data Mining is a business process that detects a large amount of data to discover meaningful patterns and rules.

The discovery model and rules are actually a business process that serves the business.What we need to do is to make the business easier or directly help the customer improve the business. Find meaningful patterns and rules in a large amount of data. In the face of a large amount of data, data acquisition is no longer an obstacle, but an advantage. Nowadays, many technologies perform better in big datasets than in small datasets-you can use data to generate intelligence or computers to do what they are best: raise and solve the problem.Patterns and rules are defined as patterns or rules that are beneficial to the business. The discovery Mode means that the target of the retention activity is positioned as the most likely lost customer. This means optimizing the customer's access to resources, taking into account the short-term benefits of the number of customers, as well as the medium-and long-term benefits of the customer's value.

In the above process,The most important thing is how to maintain relationships with customers through data mining technology. This is Customer Relationship Management and CRM.

Focus on data mining applications in marketing and customer relationship management-for example, recommendation for cross-sales and upward sales improvement, prediction of the user level in the future, modeling of the customer's survival value, divide customers according to user behavior, select the best logon page for customers who visit the website, and determine candidates suitable for marketing activities, and predict which customers are at risk of stopping software packages, services, or medication.

Two key technologies: Survival Analysis and Statistical algorithms. Add text mining and principal component analysis.

A well-managed store naturally forms a learning relationship with customers.Over time, they will learn more and more about customers, so that they can use this knowledge to provide them with better services.The result is loyal customers and profitable stores.

However, large companies with hundreds of thousands or millions of customers cannot expect to have close private relationships with each customer. Faced with this dilemma, they must face the need to learn to make full use of the large amount of information they possess-almost the data generated every time they interact with customers. This is how to convert customer data into analysis technology of customer knowledge.

Data Mining is a business process that interacts with business processes. Data Mining starts with data and starts or inspires behavior through analysis. These behaviors, in turn, create more data that requires data mining.

Therefore, companies that make full use of data to improve their businesses should not just regard data mining as a detail.On the contrary, business policies must include: 1. Data collection. 2. Analyze Data for long-term benefits. 3. Analyze the analysis results.

CRM (Customer Relationship Management System). In all walks of life, the goal of a visionary company is to understand every customer and make it easier for the customer to do business with them through this understanding.We also need to learn and analyze the value of each customer and identify which customers are worth investing and making efforts to retain and what are permitted to be lost.The cost of converting a product-centric enterprise into a customer-centric enterprise exceeds the cost of data mining.Assume that the data mining result is like a user who recommends a small jewelry instead of a small invention, however, if the manager's bonus depends on the quarterly sales volume of the small invention rather than the sales volume of the small jewelry (even if the latter is more profitable or gains more long-term profits), the data mining results will be ignored, as a result, the mining results cannot produce decisions.

We must learn from the recorded content.

Why do we need to learn:

· Data is being generated, constantly generated, and constantly updated

· Data is being stored in a data warehouse-the Data Warehouse collects data from many different sources in a common format and has keywords and field definitions in a consistent format. The Business System is designed to quickly provide results to terminals and has additional requirements on data formats and fields. The establishment of a data warehouse is designed to provide decision-making and simplify the work of data mining workers.

· Affordability

· Strong interest in Customer Relationship Management

· Commercial data mining software has been developed

Skills of data mining personnel:

· Digital skills required

· Ability to use Excel tables. Currently, the processing capability of Excel tables is quite powerful. Since Office 365 came out, this trend cannot be underestimated.

· Attitude:Don't be afraid to handle large data volumes and complex processes in order to get results. Processing large datasets, data warehouses, and analysis sandboxes is critical to successful data mining.Data Mining not only produces technical results, but also results must be used to help people (or help more and more automated processes) make more informed decisions.Generating technical results is only the first step. Understanding the real needs through the results, converting the results into information, converting information into actions, and converting actions into value is the real purpose.

The virtuous circle of Data Mining focuses on the business results, not just the use of advanced technologies.

· Identify business opportunities

· Data Mining converts data into operational information

· Action based on information

· Measurement results

The key to successful data mining is to integrate it into the business process and facilitate communication between data mining personnel and business users who use the results.First of all, we must make it clear that many people do not care about the appropriate business needs, which leads to a problem that does not help the business.

In the face of ever-changing society,Progress is far from being changed, but not changing. Even if the change is absolute, there are still unimproved aspects and directions that are not likely to change: if the experience is not retained and the child is always young, those who do not learn the lesson are doomed to repeat the same mistakes.

When discussing data mining opportunities with business personnel, ensure that the focus is on the business rather than the technology and algorithm. Let our technical experts focus on technology and let our business experts focus on business.

Loss of telecom customers:

A key factor is the excessive call. New customers use more than the cost in the first month. When the bill for the first month is sent to the customer in the first month, the customer understands the cost usage plan. By that time, the customer had generated a large bill in the second month, causing the customer to be unhappy. Unfortunately, the customer service personnel must wait for the same period of time and wait until the bill cycle ends to detect the overuse, resulting in no time to actively respond. In fact, the cause of the problem in this process is that the problem of time is reported. If the analysis report can give clear predictions or suggestions at the end of this month, the above problems will be greatly improved. This may include means between operators, which are not considered for the moment.

Solution to the above problem: the new data mining group has resources and has identified and investigated appropriate data sources. Using some fairly simple procedures, the team was able to identify these customers during their first over-call. With this information, the customer center can contact customers at risk and move them to the appropriate billing plan before the first bill expires.

The question is simple: Why can't I work without a good model working in a lab? One problem is thatModel set over-fitting through memory data. This leads to a very successful model in the lab, which is very disappointing. The goal of modeling is not to generate the best model. The goal of data mining is to be able to deal with problems in the real world, thus affecting certain changes. The stability you need is that the model not only works well in the model set, but also works well on unknown data.

There are four major causes of instability:

1,Make a mistake: Do not understand the specific requirements. As a result, the conflict exists in the actual process.

2,Overfitting: The model remembers the model set, instead of recognizing more general patterns. People are very concerned about the mode of cognition (literally understanding), which may make the mode stand out. This is not the case with the cognitive model (actual meaning. An example of overfitting.

3,Sample offset: Data created using models cannot accurately reflect the real world. This problem may occur when a model is created without random sampling of original data. For example, the data in one region is different from that in another region. Therefore, you cannot forcibly add data in one region to another region.

4,The future may be different from the past.: The model is based on historical data, but is used in other periods. Here is an assumption: Use past events to guide future events. Although it is not required that the model always assumes that the past is the beginning of the future.

Time frame:

Each variable in the model set has a time frame related to it, which describes the time period during which the variable is generated. It can be understood that the data over the past period of time will be abolished.

Both the input and target variables have time frames. The time frame of the input variable is strictly earlier than the target variable. Any model established on this model set is a prediction model. On the other hand, when input variables and targets come from the same time frame, they generate an profiling model.

Prediction Model:

Many data mining problems can be summarized as prediction problems:Who will respond to past-based responses? Who has a negative risk based on past deregistration records? The best way to solve the problem is to restrict the input variables to the point where the target face is changed.

For example, a retailer owns a target website and plans to hold an activity in March. Our goal is to collect data before January 1, September 1 and establish a model for the data to determine which customers will add the activity and what marketing measures should be adopted. What data should be used to create a model? This model should also be scored using data of the same time period. Turning the calendar back to one year, that is, September 1 of the previous year, uses the user data as a starting point, and then puts the end date on the marketing data at the end of last year, this ensures that the input information without "future" data will affect the model's target estimation capability.

The challenge facing prediction is the workload required to create a model set. Turning back the calendar is easy to write, but it is difficult to implement in a customer-centric and standardized data warehouse. The objective result is to obtain more stable results. These models can identify the cause of some important behaviors of customers.

Profiling model:

Profiling is literally based on demographic variables, such as geographical location, gender, and age. Profiling models can discover relationships under the same condition, but they cannot identify the cause and impact. For this reason, the analysis model often uses the customer's demographic information as the input, and takes the customer's behavior as the goal. In this case, it is more intuitive to identify the cause and impact.

Provides guidance on data mining methods:

· Converting business problems into data mining Problems

· Select appropriate data

· Recognize Data

· Create a model set

· Repair problem Data

· Convert data to reveal information

· Build a model

· Evaluation Model

· Deployment model

· Evaluation results

· Start again

(Provides guidance on the Data Mining Process)

Next we will explain the next 10 steps to complete a complete and guided basic data mining process.

See Data Mining Technology


Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.