Is this search engine year? Hot! Popularity of search engine algorithm clicks

Source: Internet
Author: User

People around us and companies around us are all crazy. It seems that the bet is all over this. I am wondering that a few years ago, the Institute has thoroughly studied search algorithms. It seems that they have not been commercialized until now, so patent technology quickly depreciates.

I used to perform full-text retrieval of multimedia materials. Since all of them are reading, I will also review the old knowledge.Article. I have to lament that on the Internet, it is difficult to see promising research papers on public websites. treasures are still hidden in research institutes. in enterprises, they can only do Peripheral chores, for example, engaging in box and so on ........

Indexing specifically used for search engine articles: releasing crawlers to automatically crawl these items every day, store them in the library by category, and then apply an index to increase the retrieval speed.

Keywords:
HilltopAlgorithm
PageRank
University of Toronto Department of Computer Science)
Connection switching interference
Recognize similar links
Ignore meta tags
Automatic Chinese Word Segmentation
Expert documents)
URL hash
Web hyperchain Analysis Algorithm

Another keyword: depreciation

Main technologies
A search engine consists of four parts: searcher, indexer, searcher, and user interface.

1. searcher
Searcher is used to roam the Internet to discover and collect information. It is often a computerProgramAnd keeps running day and night. It needs to collect as much new information as possible and as quickly as possible. At the same time, because the information on the Internet is updated quickly, it also needs to regularly update the old information that has already been collected, to avoid dead connections and invalid connections. There are currently two methods to collect information:

Starting from a starting URL set, hyperlink in these URLs cyclically discovers Information on the Internet in a width-first, depth-first, or heuristic manner. These starting URLs can be arbitrary URLs, but they are often popular websites that contain many links (such as Yahoo !).

Divide the web space by domain name, IP address, or country domain name. Each searcher is responsible for the exhaustive search of a subspace.

The searcher collects diverse types of information, including HTML, XML, newsgroup articles, FTP files, word processing documents, and multimedia information.

The implementation of searchers often uses distributed and parallel computing technologies to speed up information discovery and updating. Commercial search engines can discover millions of webpages every day.

2. Indexer

The indexer is used to understand the information searched by the searcher and extract the index items from it to indicate the document and to generate the index table of the document library.

There are two types of index items: objective index items and Content Index items: objective items are irrelevant to the semantic content of the document, such as the author name, URL, Update Time, encoding, length, and link popularity. Content Index items are used to reflect the content of a document, such as keywords and their weights, phrases, and words. Content Index items can be divided into single index items and multi-index items (or phrase index items. A single index is an English word, which is easy to extract because there are natural separators (spaces) between words. Words must be separated in Chinese and other consecutive languages.

In a search engine, a single index item is usually assigned with a weight to indicate that the index item is differentiated from the document and used to calculate the relevance of the query results. Generally, statistical methods, information theory methods, and probability methods are used. The methods for extracting phrase index items include statistical method, probability method, and linguistic method.

An index usually uses an inversion list to search for corresponding documents. The index table may also want to record the location where the index item appears in the document, so that the searcher can calculate the adjacent or close relationship (proximity) between the index items ).

The indexer can use a centralized or distributed Index algorithm. When the data volume is large, you must implement an instant index (instant indexing). Otherwise, you cannot keep up with the sharp increase in the amount of information. The Index algorithm has a great impact on the performance of the index tool (such as the response speed during large-scale peak queries. The validity of a search engine depends largely on the index quality.

3. searcher

The searcher function is to quickly check documents in the index database based on user queries, evaluate the relevance between documents and queries, and sort the results to be output, and implement a user-related feedback mechanism.

The information retrieval models commonly used by the searcher are set theory model, algebra model, probability model, and hybrid model.

4. User Interface

User interfaces are used to input user queries, display query results, and provide user relevance feedback mechanisms. The main purpose is to make it easier for users to use the search engine and obtain effective and timely information from the search engine in an efficient and multiple ways. The Design and Implementation of user interfaces use human-computer interaction theories and methods to fully adapt to human thinking habits.

User input interfaces can be divided into simple interfaces and complex interfaces.

Simple interfaces only provide text boxes for users to enter query strings. complex interfaces allow users to restrict queries, such as logical operations (and, or, not ;,-), similarity (adjacent, near), domain name range (such. edu ,. com), location (such as title, content), Information Time, length, and so on. At present, some companies and organizations are considering developing standards for query options.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.