Search Engine Depth Analysis

Source: Internet
Author: User
Keywords Search engine Google can analyze

Intermediary transaction SEO diagnosis Taobao guest Cloud host technology Hall

2004, what has changed as a search for the second largest application on the Internet? At the beginning of the year, Google launched a regional search; In the years, the search and Google's desktop searches more and more fire, Sogou announced the launch of the third generation search engine; At the end of Microsoft officially "Internet search and mining" into its main research direction ...





a campaign to Jianzaixianshang, ready to send the feeling, and its aim is undoubtedly 2005, the third generation of search engine fully launched the year.





-chain analysis, the history of the Traveler





has a foreign third-party agencies to search engine site for an implicit assessment, that is, to take away the logo of each site, so that users do not have the first concept of the case for their search results scoring. Unsurprisingly, Google is still the first to be evaluated, but unexpectedly, Google is only 1% ahead of the second. These 1% are almost imperceptible to users. Is it the only thing that has led to the discovery of Google, the PageRank technology, now?





, a PageRank-based hyper-chain analysis technique, it is based on the hyperlinks between the pages to determine the importance of the content of the Web page, it in the existing technology on the content of the Web site is not enough to play a role, but its theoretical basis is not solid, because this embodies the "whose voice is big, who represents the truth" thought, That is, who has a lot of links, traffic, who ranked on the front. For example, articles on SARS, Sina online articles will be ranked in front of the Chinese Medical Association website. These examples show that hyper-chain analysis is only a kind of reference technology, and cannot reveal the content itself.





If you can't understand the content, you can't personalize it. Over the past few years, search sites are using similar hyper-chain analysis technology to sort the search results, each will open a competitor's website, check the ranking of others, and then fine-tune their own parameters. This way, the search results of each website become more and more similar. Chen Yu, president of the search, said: "The results are different from others, may do well, if the results are the same as others, certainly can not do well." The hyper-chain analysis technology used in the second-generation search has not been able to substantially improve the search quality. ”





The second generation of search is out of the historical arena, and new ideas and technologies are beginning to emerge. Although their names vary, their third-generation search is likely to become a mainstream trend in the coming 2005 years, Chen Yu even predicts: "If search companies do not have a third generation of search technology in the second half of 2005, they may be eliminated." ”





third generation Search, return to the revolutionary





from the early 90 the beginning of the search engine has been born, countless companies are involved in the invention of a wide variety of technology to achieve search, but the epoch-making technology only two, one is based on artificial directory classification of Web search, it began the Internet search era, is a groundbreaking revolution. The second is based on the hyper-chain analysis technology for large-scale web search, the accuracy of its search results from the site to the Web page, so that the search experience of web users full of surprises, became a pioneering revolution.





the third generation of search is approaching, there is no unified conclusion about its concept. But it is certain that the search engine is undergoing profound changes in many ways: the search technology will be more intelligent, search resources will be more extensive, the search method will be more convenient, special search will be more rich, accept the terminal will also be developed to mobile devices. Therefore, the upcoming is not an improvement movement, but a revolution, a "return to the revolution", so that the search returned to the content itself, to close to life and ordinary users, thus also for the search industry to open up a larger market.





now, Microsoft, search and Sogou are in the artificial intelligence technology into the search ranking, so as to achieve personalized results. If the user cares about the movie, the search for "green tea" will put the results of the movie in front of it, not the drinks page. Intelligent can also achieve regional search, although the internet is cross-regional, but content and services are localized. If the search "Sichuan Restaurant", the search engine to the world's Sichuan restaurant to the user, then most of the results will be garbage.





the future of the Internet will also open up all the links of network resources. "People need the shortest amount of time to find the information they need most, which is the nature of the search," Chen Yu said. "So future searches will make it difficult to distinguish the source of search content, users can use desktop Search to find content on the Internet, local and local area networks, and to find the contents of any computer connected to your computer." Using Peer-to-peer Search, you can find shared content on a group of friends ' machines, whether they are in Tianjin or Shanghai.





came sooner





explicitly shouted "third generation search" and the loud voice of Microsoft, search and Sogou, in the past four years is not the world's top army, but this does not affect their morale, they believe that search is a technology-driven and rapidly updated industry, both financial and mental resources are very high consumption. So when a reporter asked Microsoft Dean Shen Xiangyang How to view Google's high IPO, Dr Shen replied: "This shows that the Doctor of computer science is still a bit of use, the technology people can still earn some money." As Microsoft's fifth-largest research direction, internet search and mining was launched after Dr. Shen Xiangyang was promoted to Dean. At the same time, Microsoft Research in the United States and UK has also been extensively studied in this field. In fact, Microsoft CEO Steve Ballmer has announced that Microsoft will surpass Google's search technology in five years.





"in the new round of technical competition, some people are behind the wheel and some have succeeded." That's why so many people are on the search engine industry. "Although the search is the successor of the field, but Chen Yu firmly believe that as a young company, the search will be more creative." Sohu's Sogou is also very young, it is hoped to enhance Sohu's overall technical strength and brand advantage. Looking back over the past more than 10 years, the search company has a fast metabolism: Google is not the first company to enter the search field, but it can defeat AltaVista and Inktomi, becoming the king of the second-generation search. The search company, LookSmart, was dumped by Microsoft's MSN website last October, and the market value fell 52% on the same day.





search engine is not a rely on the concept of the product can be muddling through, it can be a lot of hard indicators to measure, such as web coverage, relevance ranking accuracy, update speed and functional richness. These indicators can be used to determine whether a search company's technology is strong enough, and most importantly, users of good search technology and poor is fully aware of the ability.





in 2005, which search company will rise overnight, who will come tumbling down? The internet has been and will continue to record the history of the search engine, then let us in the next year to see the internet, who will become the third generation of search "new king."





How Microsoft wins





Microsoft will launch a new MSN Search early next year, and the beta version is now online, offering creative features including regional search and query search. But there is always a gap between the ideal and the reality, MSN area search results compared with Google and Yahoo there is a certain gap. In theory, however, Microsoft has done enough to prepare. This year, Microsoft has a number of research papers on the search by the famous academic conference for hire, including the most authoritative information retrieval of the Academic conference of the ACM SIGIR included 7, more than the total number of papers included in the Conference 10%. In Microsoft's rigorous and systematic search research, we have seen its six improvements to the original search technology.





page blocks, smaller search units





now, a Web page has a variety of features, in addition to rendering the main content, it will also display channel links and ads and other secondary information. Although the importance of this information is different for the user, it is identical to the previous search engine. Search results should be more accurate if search engines can tell which piece of text is on the page, which is advertising and navigation. Microsoft has done research that divides the Web page into chunks, using block as the smallest unit of search. From Search pages to search page blocks, Microsoft found that search performance can improve 15%~25%.




The
partition page block is done automatically because the computer has learned how to recognize the page block and its significance. The process of machine learning is roughly the same: find a number of different layouts of Web pages, manually annotated their page block and its importance, and then to provide these training data to the computer; By identifying the attributes of each page block, including position, length, word number, or whether there are pictures, the computer gradually learns to divide the rules of web blocks.





found 99 times times the new information





previously searched for data that was static on the surface of the network and could not tap into the underlying data in the database, which was estimated to account for 99% of all information on the Internet.





now only searches 1% of the Internet's content, because the current crawl search technology cannot crawl into a database, it faces three challenges: how to get requests from the database, crawl to the data, how to organize the crawled data, and how to integrate the information and present it.





For example, when searching for a shopping site, first of all to find the way to obtain information on the goods, and then identify the information, which is the price, which is the model, and finally to organize the information, with a friendly interface to return to the user. "It's like looking for a treasure in a black box and trying to get the data out a little bit." "Microsoft Internet search and Data Mining group responsible person Dr Mavilin such analogy," or like digging a thunder game, if the method is proper, the map can be opened at once. ”





labeled Everyone





when we get to know someone through a search engine, we probably need to read the contents of many links to form a general concept. Using clustering technology, high-frequency words associated with a person are identified, and a large number of search results can be grouped under the relevant categories.




The most frequently searched character of the
search group was "Mavilin", and his main business was "internet search and Data mining". And when the search for "Yao Ming", is to see the basketball stars, Houston Rockets, Yao Ming anthem and other words, the results of the classification is very interesting.





from document to knowledge transformation




What is the difference between
documents and knowledge? You can understand this: the big test approaches, you borrowed the study Committee's textbook, found above the line, wavy lines and fluorescent lines. These marked places are the teacher's emphasis on "knowledge points." Because you often sleep in class, your pages are clean, just so-called "documents." When you turn on your textbook and take out a pen to copy the marks of the Learning Committee, you are done with the process of refining the knowledge from the document.




After
, this process will be done by the search engine, and when a user searches for a person or object, it is likely to get an introduction about him or it directly in the result. This is a very good news for PhD students and journalists who are often in code.





who is the most influential person





Microsoft Search will be from the domain of relevance search to the intelligent search domain, and to achieve network search and other services. For example, to provide someone to publish a paper, so that the most published in a field of papers in the forefront of the people, is undoubtedly one of the most influential people. Mavilin jokingly said that it was possible to decide whether to give someone a tenured tenure based on the search results.





a search engine





Microsoft uses Windows to get more people to start using computers, and it wants to get more people to start using search engines by moving into mobile terminals. The number of mobile users is much higher than that of computer users, and the frequency of use is higher and the market is even bigger. So Microsoft has made mobile search the next big area of concern. Mobile search interface will be specially restructured, suitable for the width of the phone screen, so that users do not have to scroll around the screen, as long as the page up and down.





Search: Later "forerunner"





is a successor in the search field, and its President and CEO Chen Yu is a pioneer. Chen Yu has done 10 years of automatic retrieval, 5 years of artificial intelligence retrieval, it is natural, he thought of the artificial intelligence technology into the search ranking. In his view, intelligent and Desktop Search represents the future of search, and its leadership is also an active advocate of this concept and firm performer. For many things, Chen Yu can talk, but it is difficult to answer the question: "If you say that the technology is good, why Google does not do?" "But now, Google has also launched a news search and desktop Search after the Chinese search."





Internet Weekly: Why in this great change in the search technology, you think the intelligent represents the direction of the future, how intelligent is embodied?





Chen Yu: A frequent example in the search is "Cheetah", which can produce categories such as automobiles, sports and athletics, entertainment, biology and extreme sports. Such a beautiful result only intelligent technology can be achieved, second-generation relevance technology can not do this. The intelligent technology determines the possible categories based on the relationship between the keywords and the content, automatically merging according to the content. This technology spans the technology of automatic classification, and is close to the technology of automatic clustering. and automatic classification is a hand-made in advance of the category, and then most of the keywords are categorized.





only intelligent search can bring personalized results, and only into the desktop to make the search more personalized. Search launched the network of Pigs is the first desktop search software, it has its own registration number, so have the ability to personalize, it will be based on user settings and use, the behavior and habits into the search results.





Now, the voice of desktop Search is already very big, both Google and Microsoft are pushing the concept. On the Internet, the Chinese response is likely to be very fast. In the past, I said that Desktop search represents the future, and some people argue with me, and when Google does a desktop search, the world says it should be.





Internet Weekly: Google is now in the heyday of the search opportunities in where?





Chen Yu: If you see a lot of flaws in the search results right now, other companies have huge opportunities. All future searches will be closer to the user's needs, so the search in the hotel, news customization and MP3 search are available. And Google search MP3 success rate is very low. Google has become a follower of many of the things that are done in the search. We started doing news search, and a lot of people criticized us, but now, no news search is a major flaw in search engines.





Internet Weekly: Google has guided a lot of trends, including its pages, has been rated as the best search interface. But you don't seem to agree with Google's minimalist style?





Chen Yu: Google homepage was once the best interface, because at that time the user speed is very slow, the page is simple is an advantage. But in the broadband era, Google's interface is outdated. Some people think that "input bar + keyword" is called search, I think it is just a way of searching. In fact, this duplication of work should let the machine to do, joking, should let the pig to do. Using news customization, I told the network pig to send me all the news about Google. All the people in our marketing department are using Internet pigs, or how they know what their competitors are doing.





customization is only a small application of search, but it is a revolution in the way of search thinking. Search in the MP3 like KTV-although like Google retained the traditional input box, but can be in the song. For example, click "Ah du", the user did not enter any characters, the search is realized. We hope that ordinary users do not regard the search engine as a complex tool, not even know it is a search engine, but in the background does run the search technology. So we are going to use a new way to show what is the search, we are about to launch the Network Pig 3.0 version, it will give users a very powerful search experience.





Internet Weekly: After the search into the desktop, will also bring new business opportunities?





Chen Yu: Of course. Now the search for desktop addressing is selling very well, the network real name has become obsolete.





Search has four ways: one is the search of the portal, the other is the portal of the search. As it stands, the latter is better than the former. But neither is a good way to search. The third is based on the browser address bar search, which is cnnic and 3721 of the way, or using the toolbar direct search, do not need to log on to the site.





but is this the easiest way to search? I've been thinking about a question: what is the relationship between browser and search, and why should I open a browser before searching? If you can complete the search on the desktop, all the previous formalities are superfluous. So I'm proposing a fourth way of searching, desktop search. This not only does not need to login the website, even the browser does not need to open, the user can search everywhere at any time, for example input "association", can go directly to Lenovo Company's website.





Sogou: "The countryside surrounds the city"





for Sohu, the third generation of search not only means to understand the content of the return, but also means that its former main business return. Sogou to be in the next generation of search and the first generation of search in the same attention is not easy, but it has its own plans, Sohu company Research and Development Center Director Wang Xiaoquan said: "We will use a wealth of special search to attract users to the countryside to surround the city's strategy to enhance the use of sticky." "Now on the line Sogou special Search, in addition to all the news and picture search, but also include shopping search." Soon, Sogou will also launch a variety of areas of new search capabilities.





in the search mentality, sogou and searches are very similar, but also in the third generation of search to emphasize the user's personal experience, and also emphasize the search engine and user interaction. According to statistics, users in each search input of the average number of keywords less than 2, 80% of ordinary users will not use the search engine in the function of complementary words next search. Therefore, Sogou want to use the classification tips and theme tips and other functions, guide users to find the information they need, so that the search background can better understand the needs of users. Especially when the key words are not clear, such as "green tea", whether it is film, cosmetics or beverages, users need to interact with the search engine to reach a consensus.





Sogou more characteristic of the special search is shopping search, it can not only according to the brand to provide product models, can also be based on the product model back to the brand name. As a result, the search results expand from a single tree to a 360-degree network structure, and the user's mind expands.





this August, Sogou in preparation for more than six months in the case of the speedy on-line, this high efficiency from its young team, here gathered a large number of PhD and graduate students, and Google's team is very similar. This also confirms Dr. Shen Xiangyang's remark that the Doctor of Computer science is a little useful.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.