The research of Chinese search engine--search engine technology

Source: Internet
Author: User
At present, the application of search engine is more and more wide, is the Internet essential tool of Netizen.

In China, the use of a wide range of search engines are: Baidu Google search of the North Skynet search Sogou and a number of professional search, such as the mass of music to do the search http://www.1234567.com and the founder of the West Shrine Alley to do HT Tp://www.pagou.com , these are all pretty good. This shows that the search engine market is still very large. In particular, Baidu's successful listing, to the industry a great encouragement.


The current major search engine models are all, user input some keywords or sentences, either kind, the search engine will first of all the user's input to participle, which can increase the accuracy of the search results, which is different from the normal database search (ordinary database search, just simple with the like% keyword), The search engine then goes to a massive index library to find the information that is relevant to the user input, and the results will include a summary of the page.



Chinese search engine related technologies include: web spiders, Chinese participle, index library, Web page abstract extraction, web similarity, information classification.


1. Web Spider

Web spiders are the vast network crawl information program, they are often more than threads, day and night to crawl the network information, but also to prevent a site crawl too fast, resulting in information provider server overload.


The basic principle of Web spider: Start from a start page (suggest from Yahoo Chinese catalog or DMOZ Chinese catalogue) begin crawl, get this page content, summary, then extract page all connection, Spider then crawl these connections, have been continuously crawl. These are just basic principles, the actual application of a lot of complexity, you can try to write a spider, I used to write PHP (PHP can not be multiple threads, defects. )


2. Chinese participle

Chinese participle has always been the key point of Chinese search engine, Chinese different English, English each word is separated by a space, and Chinese a sentence is often a number of words, no split character, people can easily read the meaning of sentences, but the computer is difficult to understand.


At present, I understand the Chinese Word segmentation method (it is said that there is no dictionary of Chinese Word segmentation method), almost all have their own Chinese dictionary, participle to dictionary matching, to achieve the purpose of participle, participle of good or bad, and a large dictionary relationship. You can see my last article, is written in PHP in Chinese word segmentation method.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.