Technical issues with search engines (to be continued)

Source: Internet
Author: User

Technical mysteries of search engines

Search engine ----

Needle searching in the world's largest caystack

 

Search engines have developed into an important tool for everyone to access the Internet. But why is search engines so important? What is the technical development process? What is its basic goal? What are the core issues? What is the basic technical architecture? This article will give a detailed analysis and explanation as much as possible.

 

Search engines provide super services. Every large search engine company operates an international network composed of countless data centers, including thousands of server computers and advanced network devices. However, without smart algorithms to organize and retrieve the information we request, all hardware will become useless.

The two main tasks of a search engine are matching and ranking.

The matching and ranking of the two phases of online search may result in thousands or millions of matching results after the first stage (matching). These results must be in the second stage (ranking) based on relevance) sort. A search engine will pick out the best ones from a large number of hits. A good search engine will not only pick out the best ones, they are also displayed in the most useful order-the most matched pages are ranked first, followed by the second matching degree, and so on.

The following describes the development of related commercial engine companies. 1995 is an important starting point for commercial search engine companies. The background is that the number of Web sites on the Internet has exceeded 1 million for the first time, at this time, common users cannot quickly obtain the desired information by means of manual browsing. This year, many early search engine companies were created. Famous search engine companies such as Yahoo, infoseek, fast search, Alta Vista, and excite were created in 1995.

When Yahoo was just founded, Yahoo relied on manual editing of the navigation directory to sort out important websites on the Internet in different categories to meet the needs of the times when people wanted to find important websites, in this way, it quickly becomes the most famous search and portal website.

The development history of search engine technology can be roughly divided into several generations: Classification directory, text search, link analysis, and user analysis.

Search engine goals: more comprehensive, faster, and more accurate.

Three core issues of search engines:

1. what is the real requirement of a user? Based on the Data Survey, the average length of the query request entered by the user is only 2.7 words, so we need to know the real requirement of the user from such a short request, this is the first and most important issue for search engines. That is, you need to understand the real intention of users to search.

2. What information is really related to user requirements? From the Perspective of data, search engines are essentially a matching process, that is, finding content that matches user needs from massive data. Determining the correlation between content and user query keywords has always been a core research topic in the information retrieval field.

3. What information is the information that users can trust and whether the information is trustworthy is another important measure. The information objects published on the Internet will be published by any user, and there is no judgment on whether the content is credible or not, as well as malicious publishing information. There may be conflicting search answers in the search results of the same query. At this time, the credibility of the information becomes a prominent problem.

Technical issues with search engines (to be continued)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.