Search engine development history

Source: Internet
Author: User
Keywords Search engine search web site navigation Yahoo
Tags .url added archie based development development history different digital

If there is any invention that saves the Internet? That must be a search engine, or else the more information on the internet, the faster it collapses, the harder it becomes for people to find the information they need and the worse their experience will be. Search for the early shape is what? How many changes in search experience? The future of search engines will become? May wish to sum up the history of the development of search engines and found the context.

In fact, the search needs - from the many things (mainly information) to find what they want, human beings have always been, but before the development of IT technology, all the information is not digitized, the search is the only viable manifestation of the paper Directory, index, phonebook After the emergence of WAN, search needs exist, but the technology does not correspond to the rapid development, so the earliest form of Internet search is URL book. The specific form and phone book, yellow pages similar to many well-known website URLs recorded a book, the size of the professional level may be. The author himself bought an ordinary Internet users, the size of a similar thin Xinhua Dictionary, according to the different contents of the site content.

With the paper, online version of the quickly keep up. In 1994, Jerry Yang created Yahoo, and began to manually collect all kinds of Web site URLs, and sort them according to certain rules, Internet users can only remember Yahoo's Web site, Yahoo through various categories after entering the site, paper URL The book immediately becomes redundant. Part of the Internet industry Yahoo will use manual collection site and classified directory presented search is called the first generation of search engine, there are some Internet experts think that such practices Yahoo is not strictly called search engine, but should be counted as the earliest URL navigation . I tend to count it as one of the forms of search implementation, even including URL navigation.

But Yahoo, after all, just moved the paper catalog to the Internet. The search by the naked eye and the disagreement of different people on the site reduced the efficiency of such searches. Therefore, the function of automatic search based on keywords has also been applied to search engines, which is not difficult to realize because the technology of full-text search based on keywords has even appeared in the 1950s when the computer was first invented The Chinese full-text search technology was first completed as part of the 748 project and was basically completed in the late 1980s, but it has been widely used since the 1990s.

The only problem with the first generation of search engines was that URLs were still collected by humans and were inefficient, error-prone, and incomplete. So the Internet is in urgent need of an alternative to artificially collecting URLs. When it comes to substituting artificial people, people will certainly think of robots, so the search engine of the second generation relies on robots that are cast by programs and assembled and walked on the Internet The robot is now known as a search crawler or search engine spider. In fact, this kind of technology appears earlier than Yahoo's Yahoo, or even earlier than the birth of the World Wide Web.

1990 Archie invented by Alan Emtage, a student at the University of Montreal. Although the World Wide Web was not yet available, file transfers over the network were quite frequent, and because of the large number of files scattered throughout the disparate FTP hosts, the query was inconvenient, so Alan Emtage thought of developing a file name lookup File system, so there will be Archie. Archie works very much like the current search engine, which relies on scripts to automatically search for files on the web and index the information for queries by the user. Inspired by users' enthusiasm, System Computing Services University of Nevada in the United States developed another very similar search tool in 1993, but the search tool at this time has been able to retrieve web pages in addition to the index files.

Now the mainstream search engines: Google, Bing, Baidu, etc. have used the search crawler to crawl, download the page to replace the artificial, crawling crawler every certain number of days (for example, Google is 28 days) for a full Internet crawling, will All web results downloaded to their own server, waiting for people to enter the keywords by the search request.

Robot crawling webpage work efficiency was significantly higher than manual, coupled with the keyword search, a new generation of search engine debut should be earlier than the directory search and web site navigation. But the problem is that there's so much information on the Internet that people searching for web crawlers can hardly sort again. Instead, they search through keywords to find out what they want , This experience is not as good as using the directory directly.

The solution to this problem was born the strongest in today's search field, but also one of the greatest companies in the world - Google. In the late 1990s, just as Yahoo succeeded in getting people to see the huge search needs, Larry Page and Sergey Brin, then PhD students at Stanford University, developed the PageRank algorithm for measuring page-specific search engine index In other pages in terms of importance. This algorithm can basically be understood as voting, the most important part is to calculate how much each page links with other pages, the more pages with more search results and the heavier the weight, the more important this search result . Google used this method to solve the problem of sorting search results, which replaced catalog-based classification and replaced search engine solutions Yahoo! first proposed with search crawlers and PageRank. Some people in the industry to Google as the representative of this generation of search engine known as the second generation search engine, some people think this is the true sense of the search engine, the author more support the former argument.

The history of China's search engine is basically directly from the second generation search engine, the time is 1999, Baidu, search and other established search engine vendors from the outset using a search crawler and sorting algorithm combination (there are 3721 to provide the URL Navigation services, but time and Baidu, in the search, etc. almost coincide). And Google, Yahoo is different, then Baidu, in the search, are mainly for the portal search technology to provide background services, but not their own presentation site. Until Google and Yahoo entered the Chinese, Baidu, Soso and later Soso at the beginning of this century, Sogou and later 360 began to have their own search engine sites.

History seems to be over, but the latest time point mentioned above is still ten years away from now. Search engine is not static in this decade.

The search engine crawling and sorting algorithm mentioned above can only solve the current web search function. At present, all the search crawlers in the world can only crawl the whole network in a relatively long time (over 20 days), and the update frequency is slightly slower The webpage, this speed is reasonable. But for the news on the Internet, which is faster to be updated, this approach seems too cumbersome. Some domestic industry believes that with the increasing search technology and Internet speed, this problem will naturally be resolved, but in fact so far the web search has not been able to bear the search for news, and now people search through specialized news search technology to their own want Watch the news.

The earliest domestic portal to provide news search technology services is in search, time is 2003. They restricted the search crawler, which crawled the entire web, to hundreds of selected news feed sites, dramatically reducing the size of the seemingly endless Internet and changing from a few days to a full crawl A few minutes or even tens of seconds. Once a news source itself has changed, simply add it to or remove the scope of its own news source. This technology is somewhat similar to the once-hot RSS reading technology, but the latter is shrinking because of the need for the source of the information to conform to the RSS format. Greader, the RSS reader for the Google product, officially stopped its service in the summer of 2013. In addition, news search collation also slightly different, pay more attention to time, relevance, publishing media and so on weight.

Similar to news search, special search techniques for searching for special category information include image search, video search, price search, and the like. In addition, because the information on the Internet is too large, Universal Search is difficult for all the information to be professional, accurate and timely, so some vertical search for a particular industry or field has emerged. Most of its principles and news search is similar: to narrow the scope of search crawler activities, and then modify the collation appropriately.

In search of domestic or even the entire search technology contribution lies in the first attempt at a more advanced form of search - personal portal, in 2004, they released the personal information portal browser, the English abbreviation is PIG, it is also known as the Internet pig .

The reason why personal portals are referred to as more advanced forms of search is because previous search engines were passive, waiting for people to actively enter keywords to make search requests, and enabling search to become passive. The way to actively provide services was personal portals . If the search is always waiting for the user to enter keywords, then it will always be difficult to get rid of the tool's role, and directory, phone book only the difference between form and efficiency. In addition, take the initiative to provide services to users can also be more attention, use, access to more advertising revenue. So active and passive, not just a service issue.

As the name implies, a portal is a "supermarket" that strives to provide the maximum amount of information to Internet users and solves most of the demands of the Internet. However, with the addition of individuals in front, the main appeal is added to the comprehensiveness and accuracy. It seems that only the search, which uses keywords to search across the Internet, can provide comprehensive and accurate information services. The search method is to allow users to subscribe to search keywords, and then freely combined into a home page, all search results for the subscription keywords are presented to the first time to open a browser Internet users.

After this Google also launched its own personal homepage product - iGoogle, and more feature-rich (added to the weather, stocks, etc.). But personal portal products have not been as successful as traditional search engines, at least on the desktop Internet, and neither Internet Pig nor igoogle has the desired results from search vendors, which in the winter of 2013 was just as Greader Stopped service. Other attempts to actively provide search services for Internet users also include Yahoo, they also allow Internet users subscribe to search keywords, and every day after the search results will be updated automatically sent to the user's mailbox.

China's domestic search innovation also have to mention Baidu's PPC mechanism: eager to promote their own businesses according to their own search results of the clicks paid to search engine manufacturers, business promotion information appears in the search results, from a single Click on the order of payment results to determine the result (pay higher in the top). Despite criticism from the industry, this mechanism still solves the problem of search engine vendors' eating, so that it can get rid of providing backstage services to other websites. At the same time, the profiteers who get started also attract more players to follow up into the search engine market , Promote the technology, market prosperity.

However, these attempts are based on the second generation of search engines, regardless of category, display or profit model. Although this generation of search engine search crawler to solve the huge amount of search results, the overall demand, but only with keywords and PageRank sort method can not be completely accurate. No matter English or Chinese, a variety of meanings of the same keyword appear to be common, but no matter how good the sorting method is, all the results that everybody really needs can be put in the first few pages. Everyone's search result may appear in Hundredth, Thousandth, or Ten Thousandth, because the information on the Internet is really too much, and there is a chance of repeated information appearing.

Attempts to search engines for the next generation have begun. In 2011, search engine manufacturers in China searched for the third-generation search engine platform, marking the first generation of the third-generation search flag. The reason that Zhongsou claimed to be the third generation was that: unlike the first generation, which used purely manual collection of search results and completely second-generation search crawlers to crawl the results, their search engines adopted a man-machine combination approach: the search crawlers continued to collect Webpage, to solve the problem of the amount of search results, but with artificial search results will be classified, organize, solve the accuracy of search results. In front of the author said that this is an impossible task, found in search solution is to allow every Internet user to participate in the process, they will open the entire search, any search results have any disagreement, there are different ideas can be Proposed changes, unlike Baidu users can only accept search results. Search results in the search results have also changed the way to become a keyword for the meaning of a similar portals of the multi-frame page (as distinguished from the directory structure of other search engines), different meanings of the same keywords have completely different topics Page rendering.

Since then a large number of "third generation search" to follow suit, but regardless of the pros and cons, the search results collection, presentation is not like search, with the existing second-generation search engine any significant difference, claiming that "the third generation "Unreleased groundless.

In 2012, Google also announced the launch of a knowledge map similar to that found in ZhongSou. It also has strong extensibility, displaying information related to keywords on the margin. Baidu also made similar adjustments in early 2013, but these were achieved technically without adding labor. Google's more important new-generation search attempt also includes the migration of search into specialized hardware, Google Glass, although it is not yet certain that it will be a success, but the direction is clear: future searches will be taken away from people's lives More recently, it is likely that the input is not limited to text input requests and expression results, nor is it limited to a two-dimensional world.

However, for the general public, the more realistic attempt at the moment is the innovations in mobile search. Or in the search, the third generation of search migration to the mobile outside, they re-start the personal portal. At the end of 2013, CZSE released the search Zhaoyue mobile personal portal, in addition to search, news and other functions, but also added a site navigation, application store, third-party evaluation, life services and other search on the mobile terminal may achieve the main function, and before Just like the personal portal, search-search Yue also be able to accept the user's subscription, and take the initiative to update the search results, more active is that it can use the mobile Internet to push to the user.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.