A friend wants to capture and mine Sina Weibo as needed. In particular, this part of sentiment analysis facilitates his later experimental practices. In fact, text mining and analysis will produce greater results in the future. To give a simple example, everyone in the subway will refresh their circle of friends and friends every day. And these messages
A friend wants to capture and mine Sina Weibo as needed. In particular, this part of sentiment analysis facilitates his later experimental practices. In fact, text mining and analysis will produce greater results in the future. To give a simple example, everyone in the subway will refresh their circle of friends and friends every day. And these messages
A friend wants to capture and mine Sina Weibo as needed. In particular, this part of sentiment analysis facilitates his later experimental practices. In fact, text mining and analysis will produce greater results in the future. To give a simple example, everyone in the subway will refresh their circle of friends and friends every day. Most of these messages are text-based. How to mine these original messages. And then prepares for the appropriate precision marketing. It will produce significant results in future marketing.
Raw data
This part of content can be captured through crawler technology. Use clustering algorithms to find all Weibo posts on the same topic. And use it as the raw data. There are also comments in the user's circle of friends, and connection messages generated by the user, and so on. These can be classified into our database as raw data.
Set goals (business understanding)
This step also needs to be well associated with the business understanding. First, what should we use the raw data? For example, we use sentiment analysis to understand the same event by different users. We need to find out the keywords in their Weibo information. However, the corresponding search algorithm is used to determine the evaluation of this event by all users. Then, we need to find customers with secondary commercial value through user evaluation. Then it is set as the target customer.
Data Understanding
What are the captured Weibo content and how many links are included? What symbols are used to connect to it? Text, image classification, and comment information are also useful for reference. What are the special symbols in it? And so on. These information are useful to us. how can we use it? For example, you need to mine keywords from the text to conduct related marketing activities. To determine the emotional trend of the bloggers. What can we do? Understanding our data can better capture the data we want.
Model creation
Some people say that data mining is intended for this part. If you want to build a good model, then the data will pass through your model, and the content you need will be automatically displayed. This is also the most difficult part of data mining.
For example, we can use the decision tree algorithm to create a model for our Weibo data. Finally, the customers who output these words are our target customers. Alternatively, we can establish a model through a neural network algorithm to find the relevant decision items. In fact, data mining uses many and complex methods. I still haven't figured out the core idea of some algorithms. However, this does not affect the use of related algorithms for mining. In addition, there are not many materials that are actually petabytes of content for the current mining objects. Many enterprises are still at the minicomputer stage. So sometimes, I will joke that if the data volume is too small, EXECL will be better and then ACCESS the database. Then there is the ORCALE database .....
It is a great job to create a model. However, after the model is created, it will not change for 3-5 years. For example, the credit scoring system of our current credit card.
Model Evaluation
This part of content is related to optimization. That is to say, after the model is set up, you have to run the business. Test to what extent it can run. Sometimes, after half a year of mining, you also find the target customers, and the results are attracted by others in other ways. what should you do? Therefore, we need to evaluate the model.
First, take part of the big data, usually 40% for first training. you can also try it with a small amount. Then, it will take some time to complete the data. Can I increase the mining time after I change other algorithms. Generally, the data distribution in this step follows the principle of, that is, 40% is used for training, 30% is used for testing, and the other 30% is used for verification. In general, we can evaluate the quality of this model and whether it can generate its corresponding value.
Release model
This is the last step to optimize algorithms for all Weibo data. To achieve the best mining effect.
In the above steps, model evaluation and business understanding are also complementary, because these two are the most closely related to data. Data Understanding and business understanding are mutually restricted. in many cases, we have big data, but we cannot find the target to be mined, at this time, we need to constantly revise our business understanding and data understanding. However, the relevant content of the model is not so important throughout the entire closed-loop process. Sometimes, the customer's simple requirement is not necessarily implemented using complicated technologies. simplicity may mean victory.
Let's talk about the ideas in text mining, a small amount of text information. We can copy and paste the data in WORD. if there are many words, we can use EXCEL. if there are more words, we can use U1. if there are more words, we can use SAS and R, I have never used any other software.
Well, let's sum up so much. Please try again next week!