Chapter 8: Beauty of Simplicity-index of Boolean algebra and search engines

Last Update:2014-10-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

In the following chapters, we will discuss some basic knowledge about search engines. To really do a good job in search engines, There is no shortcut. To do a good job of searching, the most basic requirement is to analyze 10-20 bad search results every day, so that you will feel it only after a period of time. However, many engineers often cannot do this. The search diligence principle is actually very simple: automatically download as many web pages as possible, create a fast and effective index, and sort webpages fairly and accurately based on relevance. Next we will introduce them one by one.

1. Boolean Algebra

The Calculation of Boolean values is quite simple. It should belong to the knowledge of high school and will not be described here. Let's take a look at the relationship between literature search and Boolean operations. For keywords entered by users, the search engine should determine whether each article has this keyword. If so, it should give the document a logical value-true (true or 1) or false (false or 0 ). For example, we are looking for documents about computer applications, but we don't want to look for software. It can be expressed by such a statement "computer and application and (not software.

2. Index

Most people who use the search engine are surprised that it can find 10 million search results in a very short time. Obviously, it is impossible to scan all the webpage text, so we must use a skill to build an index. The simplest index structure is to use a long binary number to indicate whether a keyword appears in the article. One article corresponds to one article. For example, the binary number of "computer" is 01001000110..., indicating that the second, fifth, ninth, tenth... literature contains this keyword. Similarly, assume that the binary number corresponding to the "application" is: 00101001100 ...., when searching for the "Computer Application" document, we only need to perform the Boolean operation and the document with the result 1 corresponding to the two binary numbers meets the requirements.

Because the number of web pages on the Internet is huge, there are also many words in the network. Therefore, this index is huge. Therefore, the common practice is to divide the index into many parts based on the serial number of the webpage and store them on different servers. When a query is received, the query is distributed to many servers. These servers process user requests in parallel and return the results to the master server for merge processing, finally, return the result to the user.

With the increase of content on the Internet, there are more and more data. Therefore, you need to establish indexes of different levels, such as common and uncommon, based on the importance, quality, and Access Frequency of webpages. This is similar to the difference between the page table and the quick table in the computer. However, no matter how complicated the search engine index is in engineering, the principle is still very simple, that is, it is equivalent to Boolean operations.

Chapter 8: Beauty of Simplicity-index of Boolean algebra and search engines

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Chapter 8: Beauty of Simplicity-index of Boolean algebra and search engines

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Chapter 8: Beauty of Simplicity-index of Boolean algebra and search engines

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support