Baidu and the Drug Administration recently reached a strategic cooperation, Baidu will make the Drug Administration of drugs data for people to provide drug-related inquiries. The price that Baidu paid for the data was not mentioned. The world does not have a free lunch, although the FDA is for the benefit of the people, but this batch of data is clearly not for nothing. That means the time has come for search engines to pay for the data. I would like to talk today about the search and data relationship of some views. Note that the big data is too far away from us and this is not about big data.
360 and immediately before the strategic cooperation to jointly operate the food safety and Exposure column column, and 360 will be with the immediate sharing of the data of the FDA. Before that, the 360 search engine accessed the microblogging search results through such searches, before Google bought Twitter data to provide Twitter search results.
Google does not do evil, the thing is to "integrate global information, so that everyone can access and benefit from" and "accelerate the flow of information." Baidu Simple to rely on, the thing to do is "let people the most convenient access to information, find the request." Different representations, search engines are essentially the same: help people find the information they want. Along with the wave of socialization and mobile Internet, data exploded on the Internet. How to deal with these exploding data is both a challenge for search engines and an opportunity for search engines.
The specific analysis is as follows:
One, darker than the dark network of large data network
Aggregating all the information on the web has always been the dream of an aspiring search engine, but it is a task that is impossible to accomplish.
94 Dr.jill Ellsworth put forward the concept of "dark net". A surface network that is stored in a network database and cannot be accessed through hyperlinks, and is not part of a standard search engine index. The scale of the dark net is also far beyond our imagination, according to scientists, less than 1% of human information to achieve the Web, and Web pages, the search engine can crawl about 1% 500.
Can not crawl the existing site itself is not a subjective problem (not in line with the Web code, the search engine is not friendly, etc.), but also the site itself the problem of subjective shielding, such as Taobao, Youku and other sites to screen Baidu crawler is this kind. The search engine has made a lot of efforts to solve these two kinds of problems. Including crawler crawl technology optimization, the promotion of legitimate SEO and similar to the Baidu Aladdin program.
Baidu's Aladdin plans to provide the interface, Third-party Web sites to actively access their structured data, users can search the results before they see this information. Baidu expects the Aladdin lamp God to "illuminate" the dark net. Similar plans include Google's onebox,360 Onebox (360). But as the problem of the dark net remains unresolved, a darker net has arrived.
1, more and more privatization of the web data.
The website of the electric business, BBS, know the question and answer, Interactive encyclopedia, watercress movie and so on content belong to this kind. Vertical web site in a certain scale, with the ability to game the search engine, it can block the search engine crawler, its own data "privatization." The search function provided by vertical Web site can provide a better search experience with personalized search function and unique mining ability. Even rise to a vertical search engine, such as knowing the search. Another vertical search engine is the integration of other vertical structured data to provide search services, such as where to go, a Amoy.
The author believes that with the development of the Web, vertical search is a direction of the future Search engine subdivision, and will pose a threat to the traditional search engine. Similar to the relationship between a mobile browser and a native app: The browser and app traffic are in half. We regard the traditional search engine such as Baidu as this one browser, then the vertical search engine is the app. Vertical search engines are growing as well as apps. And they have the core advantages are: Personalization vs unified advantage.
If web data is privatized, the ratio of "web-enabled information that can crawl: Can't crawl is about 1:500" has changed. The following will affect the "less than 1% information web" of 1%.
2, the huge growth of the data without the Web.
With more than more than 10 years of development, PC Internet has accumulated a lot of data, and in the mobile internet, apps, cloud apps, social networking and IoT have exploded data. For search engines, these data are almost invisible.
Manually collated data:
The data of the FDA are examples. Such data are concentrated in the hands of government departments, institutional organizations and companies. In their hands, they hold the authoritative people's concern for people's livelihood data, and temporarily did not open these data through the website. Similar to the data has the transport sector, environmental protection departments, Tourism Bureau, Health Bureau, Education Bureau and other areas of concern to the public. After more than 10 years of information construction, these data must have reached a considerable level of magnitude.
In addition, the "I check" barcode data can also be classified as such. I'm checking out the team. Early on, hundreds of people gathered merchandise barcode data at the National Mall. I'll check. After a certain scale, users will not actively add barcode data to it.
Social-generated data:
The social network here is not just about microblogs or Renren. QQ Chat is also a kind of social. Mail is also a social. The Tiger Sniff NET is also a social. Even SMS communication is a social one. We might as well call this "dark social". These social processes produce a great deal of information, especially sharing behavior. Some of the social networking sites ' data is web-only, but they're closed. This part of the data is growing at a huge rate, and the search engine is powerless. Facebook searches for its own data through graph search, Weibo searches, everyone's, and "dark social" data, who searches?
App-generated data:
Sogou Wang Xiaoquan once threw the "web dead" argument. The mobile internet is no longer a network that is interconnected by hyperlinks across the web. App links to each other through the interface, the app on the different users through the QQ friend relationship, micro-ring, micro-blog concerns, mobile phone numbers and other ways to link each other. The traditional search engine is based on hyperlinks. The real problem with the problem is how the search engine searches for app data like snapped.
Data generated by personal cloud applications:
Personal cloud application is mainly to solve the problem of multiple-screen synchronization. This allows more users to choose to keep the data in the cloud. Download and use the data after the account is authenticated on different devices. This kind of application in addition to sync Address Book, bookmark this kind of privacy strong data, there are impression notes, NetEase cloud reading type of large text data. Personal cloud applications will be more and more. A few years later, I think it is not impossible for office to provide cloud synchronization capabilities. These data, search engines can do nothing.
Data generated by the Internet of Things:
Internet applications such as car networking, surveillance video, electronic meter reading and hydrological monitoring are generating a lot of data every moment. The industry hasn't exploded yet. When the explosion, the application will not be limited to this. Internet link Web, mobile internet links the world, and things connected to the world, linked to all things. Now the number of mobile phone users in China has exceeded 1.1 billion. The mortal beings are basically connected. However, compared to 1.1 billion, the number of Internet users is an astonishing magnitude. These "users" will also produce a large amount of data. will these data be searched in the future by humans, and in what form, what is the result of the search?
Ii. How large data flows
Baidu's Aladdin plan once possessed the magic of absorbing structured data, and many structured data, such as weather forecasts and book information, are actively connected to the Baidu box calculation. In order to obtain traffic from Baidu and users. Vertical sites have also been through SEO to upgrade Baidu rankings. And the situation is reversing. Structured data is no longer actively streaming to Baidu. Vertical sites tend to privatize these data, or they are limited to some search engines.
The search was created by ambitious Google engineers and was originally intended for social search. Facebook's graphsearch is not yet known. But the search now goes to the direction of providing search technology services for Sina and instant companies. So the search has not improved on its own social search, in the final analysis is from the search into social is a dream, because no users, there is no social, there is no social search dependency data. The social data needed is on Twitter. So, so go to the microblog.
Baidu search for more than 10 years, in how to attract users to make a lot of login, but still did not form their own account system. Google's painstaking googleplus cannot shake Facebook's position on social networks. Similar examples include Bing. In an interview in October 2012, Shen Xiangyang said the Bing strategy is social search, entity search (mobile search) and maps. Now the main direction of Bing China has become English-language search.
1, away from the search engine data
Who has the big data to search for? Vertical Web sites are privatizing their data, social networking sites are inherently privatized, and cloud application providers are privatizing the user's private data, and the app's data is privatized because it is not web-owned, and some of that data is in the hands of Governments, organizations, and ordinary businesses.
Data was once actively flowing to the search engine, and now structured data, especially valuable structured data, are slowly moving away from search engines to a private domain. This will produce a snowball effect of data: where there is data, there will be more and more data, and where there is no data, there must be more cost to get the data than the spider can climb.
2, search engine will degenerate, or change position?
The traditional comprehensive search engine next to solve is not "accelerated information flow", because a lot of information is out of reach. It also underscores the significance of Google + and Gmail's data-gathering applications for the future. Perhaps the future of the search engine, Baidu, the traditional web search engine will be degraded to "vertical web search engine." Because Web data is only a small part of the network data. Here again borrow Wang Xiaoquan words "The web is Dead".
Of course, there is also a possibility that the search engine can still be enough to get the data, paid for. The change in its position in the biosphere. After more than 10 years of searching for free data, search engines will have to pay more for the data. The FDA is just the beginning.
Three, the value of large data to search
Humans have reached the point where they cannot live without information. Data explosion, according to Darwin's theory of biological evolution, human information absorption, screening and processing capacity should also evolve. People's demand for information does not degenerate, but more thirsty. The search engine needs to solve the problem, no longer is to help people from the vast amount of information found in the results. Instead, find the only one in the mass of results. It is more important to find the correct answer quickly than to find more answers.
1. The value of structured data to search.
Structured data and Web page data are more satisfying than the 1th: finding the only answer. Web analytics are matched by text. The analysis of structured data supports the active access of content providers, and also supports the personalized and accurate analysis of search engines. Both of these ways increase the cost of content providers or search engines, but the rewards of paying are the only answers that users can quickly get accurate.
2, large data mining is the opportunity of search engines.
It is no longer just to speed up the flow of information, but it is too simple to make the 1th mention of structured data access and presentation. What does a search engine do? Help humans do what the human brain cannot do: data mining. That is, mining value from massive data. People say that big data is a gold mine. But when it came to gold, people couldn't find a way or find tools.
After more than more than 10 years ' development, search engine has accumulated rich in text analysis, relationship discovery, Atlas Construction and user semantic understanding. These technologies are the basic technology for large data mining dependencies. We'll call it the digging engine. In combination with mining and traditional search, we can call the "recommendation engine" by mining responses to the user's active or passive search requirements.
Watercress and some of the electrical business site has already been explored in this regard. Watercress because initially will be "recommended" as one of its core functions, has some molding results. Perhaps, we put aside the UGC pattern of watercress, its search + recommended model is worthy of concern: watercress focus on cultural products, it has been quietly online "find something like", you can comment on, share and recommend any "things", any "thing". Now is a low-key experimental products, but I think this may be the future of watercress, this future is very far, because the watercress is very "slow".
To sum up: if large data is gold, vertical sites with large data, social networking sites, apps, cloud application providers, Internet owners, government organizations and businesses are both the bosses of gold mines. They can get their own nuggets from gold. Gold can also be sold to search engines or large data mining companies to dig. While the search engine pays for gold, it must transform itself from a pipeline that accelerates information flow to a gold digger.