I have been engaged in search engine related work has been 11 years, today with you talk about search engine core algorithm: Natural language and Boolean search. The discussion leads to the following conclusion: Search crawler and search engine use some heuristic method to rank the page, and return the result. Crawler observation mode to determine the content of a Web page, search engine search query mode, and the pattern identified by the crawler to compare, and return the results.
The complexity of this theory is that we are using active, growing, evolving languages, which means that language usage patterns are changing. To keep up with this change, search engines must also be active, growing and evolving, so heuristics are a very important concept in understanding how to locate a station for a search engine. The easiest way to understand it is to compare past and present search behavior to determine how the search evolves.
Start with Boolean search
Today, people's search methods and search engines just come out when the search is completely different. Remember that previously mentioned Archie, Gopher, Jughead, and verojnuca the ability of these early indexes and search programs is quite limited, and to find information in the index, you must know the index very well. In fact, when you use Archie and Gopher, you must know the exact location of the document or file you are looking for.
With Jughead and Veronica, you can actually search for information: But then, search is still very basic. When the search finally becomes possible, there are some strict rules for how to find files. In the early days of search engines, there was no very popular natural language search today.
The user must specify that they want to search for "the phrase", rather than search for "that phrase", or search for a specific phrase. Input logic-the method needed to find the correct file or document in the index. Boolean logic is based on the logical algebraic system proposed by Georgeboole in the 19th century century.
In fact, Boolean logic is the decomposition of data into a set until the data set is very small enough to meet the requirements of the initial query. For example, when searching, there may be 1000 pages on the network about "pools", 1000 pages about "saltwater", and if you search "saltwater pools", all 2000 pages will be returned. This is too much. However, merging the two terms to find only pages that contain both "saltwater" and "pools" only a small portion of the original 2000 page, as shown in Figure 5-1.
To make this example go further, you can add a qualifier, such as "not chlorine", to narrow the data collection. When you add this qualifier, the other part of the data is removed, satisfying the pools,saltwater. But not chlorine query has fewer options, such as
This example illustrates the 3 operators used in Boolean search: With, or, non. Boolean logic is based on a logical algebraic system, so these operators can be represented by a symbol:
• With: +
Non;
• OR: The default operator, which returns all pages that contain any word, regardless of their proximity. The operator is represented by a space between words.
Initially, there are 2000 pages, but using Boolean logical operators to decompose data sets greatly reduces the search scope. You are more likely to find what you need now, and you'll find faster.
In the early days of Internet search. Boolean logic helps users locate the files and documents they need. From the point of view of heuristic method, Boolean logic provides perfect problem-solving ability for search. But the technology will gradually mature ...
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.