Chinese search engine technology unveiling: System Architecture (2)

Source: Internet
Author: User

Source: e800.com.cn


Search engine technology and Classification

The basic technology of search engines is full-text retrieval. Since 1960s, full-text retrieval has been studied abroad. Full-text retrieval generally refers to text full-text retrieval, including information storage, organization, performance, query, and access. Its core is text information indexing and retrieval, which is generally used by enterprises and institutions. With the development of internet information, search engines are gradually developed and widely used in full-text retrieval. However, search engines are different from full-text retrieval. The main differences between a search engine and a full-text search in the general sense are as follows:

1. Data Volume

The traditional full-text retrieval system is designed for enterprise data or enterprise-related data. Generally, the index database is larger than GB, and only a few million pieces of data are large; however, Internet web search requires processing billions of web pages. Search Engine policies use Server Clusters and distributed computing technologies.

2. content relevance

If there is too much information, it is particularly important to check accuracy and sorting. Google and other search engines use the web link analysis technology and use the number of links on the Internet as the basis for determining the importance; however, the full-text retrieval data sources are not highly correlated and cannot be used as a basis for determining the importance. They can only be sorted Based on the relevance of content.

3. Security

The data sources of Internet search engines are public information on the Internet, and other information except the text body is not very important. However, the full-text retrieval data sources of enterprises are internal information of enterprises, there are restrictions on levels and permissions, and there are more strict requirements on the query method. Therefore, the data is generally stored securely and centrally in the data warehouse to ensure data security and management.

4. personalization and intelligence

Search engines target Internet visitors. Due to the data volume and number of customers, it is difficult to apply computing-intensive intelligent computing technologies such as natural language processing technology, knowledge retrieval, and knowledge mining, this is also the direction of the current search engine technology efforts; while full-text search has a small amount of data, clear search requirements, a small number of customers, and can go further in intelligence and personality.

In addition to the above differences, the search engine and full-text search combine the characteristics of Internet information to form three different types:

Full text search engine: Full text search engine is a real search engine, foreign representative of Google (http://www.google.com), Yahoo (http://search.yahoo.com), alltheweb (http://www.alltheweb.com), etc, domestic famous Baidu (http://www.Baidu.com), search (http://www.zhongsou.com ). They are databases created by extracting information from various websites on the Internet (mainly webpage text) to retrieve records matching user query conditions, then, the results are returned to the user in a certain order, which is also a general search engine.

Directory Search Engine: although a Directory Index has a search function, it is not a real search engine in a strict sense. It is only a list of website links classified by directory. You do not need to perform keyword queries. You can find the desired information only by using the classification directory. Famous foreign Directory Index search engines are Yahoo (http://www.yahoo.com) Open Directory Project (dmoz) (http://www.dmoz.com/), looksmart (http://www.looksmart.com) and so on. Domestic Sohu (http://www.sohu.com), Sina (http://www.sina.com), Netease (http://www.163.com) search also has this kind of function.

Meta-search engine: when receiving a user's query request, the meta-search engine searches on multiple other engines and returns the results to the user. The famous Meta Search Engine dogpile (http://www.dogpile.com), Vivisimo (http://www.vivisimo.com), etc., domestic Meta Search Engine representative search Star Search Engine (http://www.soseen.com/), ur search (http://www.yok.com ). In terms of search result arrangement, some search results are directly arranged by the source engine, such as dogpile, and some are re-arranged and combined by custom rules, such as Vivisimo.

Other search engines such as Sina (http://search.sina.com.cn), Netease (http://search.163.com), A9 (http://www.A9.com) are called other full-text search engines, or on the basis of the search results of the secondary development.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.