Three big data understanding

Source: Internet
Author: User
Keywords Data mining big data era
Tags analysis application applications based big data big data era content data

As for the concept of big data, everyone mentioned it for many years. When it comes to big data, but also personalized, accurate so that we can only think of these. What is unknown is that in fact IBM and many major Internet companies mentioned this topic long ago. But wait until it really show the value of commercialization or commercialization, but in 2011 and beyond.

There are reasons for this situation:

The first is that in the past two years, with the popularization of social media, the amount of data has exploded. Every day we share words, pictures and videos on the Internet, and its data exceeds the scale of any period in our history. More and more valuable information. Historically, many years ago everyone on the Internet, nothing more than a variety of information pages, small sites, large sites. The value of information is not as full as it is now. Too much information now reflects your motivation or your interest profile. For example, my son is three years old, I often send my son's photo, send some of the psychological feelings of raising children. As a result, he exposed my interest in the direction. This will be very helpful for our data application. Unlike most of the top you see a variety of web pages, the home page no way to unify the big data analysis.

There is a more important point, but also I feel a bit deeper is open. We mentioned before the personalized, everyone heard the largest case of big data is Wal-Mart's diaper, that is, the prototype of big data, if we use this data to the website, we think of the most is privacy. Because your hobby has violated my privacy. But now microblogging, Twitter, facebook, the typical characteristic of many social media is open. QQ and the biggest difference is that weibo, QQ's relationship chain, who is your friend, we do not know, if this is your privacy. But weibo, who did you listen to, whom you care about, every one of your microblogging everyone can see. You use this product, which means you signed an agreement, I want to open. Can not say that you do some data on this basis is the invasion of privacy, this is not theoretically established. Because of the characteristics of each person, even if the technology does not do, we use the naked eye to pull your microblogging look down, you can probably look at your area, or your interest. This also provides a very big prerequisite for us to do large-scale data mining applications. The privacy issue was fixed before the product was formed.

Here I have a simple column about the data of the Weibo, we made a daily microblogging, as well as pictures, are now calculated in the tens of billions. There is one inside, in addition to your microblogging itself, there is an important social network, is your personal relationship. In the QQ there is no way to do analysis, because he is private. But the microblogging is very good point, your social relations we can rest assured to use. Because it is open. In a product such as Tencent Weibo, you will find that the number of social relations reached the order of 30 billion. There is a propagation path in it, and you send a message that the path of its spread is trillion-level. This is a big big data. It is hard to imagine that such a product as Weibo has such a large product system that we have nearly a thousand servers to calculate and are not used to make products and are not used to provide services but to make offline calculations , Calculate a variety of formulas and results, you need nearly a thousand servers. These servers and server performance a few years ago, the scale is very different, each server has several T storage space, we have nearly a few thousand T server to do the operation. So simple one thing, you need nearly a thousand servers to help you calculate what you in the end may be friends, in the end with which people are you interested in. At the time of the earthquake, we quickly analyzed and calculated what earthquakes are now, and quite interesting and geographical distribution, all of which are the applications of big data. Big Data There are two main types of data that you can use. The first category is very important inside the Weibo, you publish, listen and focus on all these are public. Some video data is typical browsing behavior. This part is still involved in the application. So be cautious.

Big data can do? We explore so many places big data, nothing more than to sum up to do three things:

First, the understanding of information. Every picture, every news, every advertisement you make is all information. Your understanding of this information is an important area of ​​big data.

Second, the user's understanding of each person's basic characteristics, your potential characteristics, each user's habits and so on the Internet, these are the user's understanding.

Third, the relationship. Relationship is our core, the relationship between information and information, the relationship between a microblog and another microblog, the relationship between one advertisement and another. The relationship between a microblog and a video is relatively straightforward when we look at the naked eye.

For example, there are articles saying that North Korea kidnapped our ship for the past two days. The microblogging is probably about this matter. One can see the human eye. But it is very difficult to see how this can be seen with the machine. This is one thing, and the causal relationship between them. Then there is the relationship between user and user. Who are you willing to listen to, is your friend, what are your areas of interest, you are a musician, you are a food, the user is also a food, you are willing to listen to him. This is the relationship between user and user understanding. There is also an understanding between users and information that you are interested in which type of Weibo, what type of information you are interested in, and what type of advertising or merchandise you are interested in if it involves commercialization. In fact, the relationship between users and information, he is nothing more than doing it.

Big data that hanging, in fact, mainly to do three things: the user's understanding of the understanding of the information, understanding of the relationship. If we still have to mention something between these three things, one is called the trend. He is also a variant of the relationship, but the relationship is slightly farther, the analysis between emotions, as well as public opinion monitoring done by our government departments. He can monitor large-scale data and analyze human behavior. In the United States, Hollywood, the past two years is also based on FACEBOOK and TIWTTER data to predict the forthcoming movie box office. He is also a trend analysis, but we put this trend ahead of schedule. The core is these three things.

Why should we talk about these?

Because these are fully reflected in our new version. What to do with our new version. The new core is to do to improve reading efficiency of this matter. Microblogging itself is very fragmented form. This fragmentation is not due to the fragmentation of the time we spend on the Internet. I mean the fragmentation of information. Weibo is the one hundred words, you listen to, concerned about what people are very random, you see the information is very piecemeal. When you see something, it's basically impossible to see the whole picture on Weibo. A few people say a few words, gold three fat too much, catch us fishermen, can you know what this thing is? You want to know what this thing is. High-end people can search a search, transcendence of this matter. For some white, he did not know what it was. News portals at this time is very important. I'm going to press the news portal, he probably can see what this thing is. This is related to the product itself. But also with these two years as we all know more and more microblogging, information explosion, information overload. Now I listen to more than a thousand people. If you do not watch for ten minutes, you must have read more than one hundred of them. There are a lot of information I'm interested in missing out in this one hundred, or hundreds of pieces of information. Hundreds of people inside a large number of marketing, jokes, dice, I am very interested in my ex girlfriend's dynamic buried by these pieces into it. But my ex-girlfriend is less active. She even made a microblogging in a week or two, basically hard to read. This is the characteristic of our Weibo product itself. So we use several functions to solve the efficiency of information acquisition. The first is to classify Weibo from the perspective of content, and our channel is to classify the content from the perspective of content and extract high-quality content. Micro-hot spot is also from the content point of view to classify the content. Lushan earthquake, others only mention a few words, in your top bar which will appear Lushan earthquake content, which is classified content. Second, categorizing information from a human perspective, our new version will introduce something called a micro-circle, which is a smart grouping. Especially those of us who are more active, you may have heard eight hundred, five hundred people, you have no patience one by one group? This is very difficult, most people will not do it. So we are smart grouping, you do not have anything to control, help you directly points good. My ex girlfriend I can build a former girlfriend group, I want to see her news, I point to that group, you can sort of dynamic look at these people.

As an understanding of the user's ability, we all know Kai-Fu Lee is like this, we all know that he has so many Weibo which we can analyze the technical point. These points do not involve the privacy? You go to his page to see if you can get these conclusions. Just so many points, can you analyze by application.

Finally, to summarize what big data is doing.

How to increase the data in the past two years is just like a lot of technologies that have appeared in history. It is just a capability and a technology, and it is only a tool. Just two years due to the expansion of data size, and the birth of a lot of new products and commercialization based on big data. So we just fire it to mention it. In fact, it is only a tool. Our conclusion is based on big data to address the needs of users, to provide the best quality service is the ultimate goal, big data is only tools, not so hanging.

So far, most applications based on big data are still largely commercialized. More or how to do accurate advertising, how to do product recommendations, how to do these of these. Really dare to make it product is still relatively small. Because all of these big data smart stuff has a feature that is not 100% accurate. Not like what you want to do a product function, which button on the release where there will be no mistake. The characteristics of big data is that it can only achieve a certain accuracy, you dare to product, depending on your ability to improve the accuracy to what extent. For example, this time we micro-circle, or micro-hot examples, if you put a user intelligence grouping can only be accurate to 50%, 60%. The result is counterproductive. When you get the hot clusters together and match the information on your homepage, are you getting 80% accurate or over 90% accurate? That's why I started to tell you big data. In fact, for the first time, we are also a large-scale applications of the technology of big data and Tencent's unique data size advantages to the product.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.