Absrtact: In the network company has done the program development friend all know, we commonly use the database search technology is uses the user input vocabulary, with the database in one or several fields content compares, similarly, the search engine's operation principle simply Also
In the network company has done program development friends know, we usually use the database search technology is the user input vocabulary, and the database in one or more fields in the comparison, the same, the search engine operating principle is simply this:
User input A word, search engine from his database to find matching content, and then in order to show the arrangement to the user, search engine every day is tirelessly to repeat these operations. It seems that everything is normal, we use the data to analyze the problem
Global Internet users according to 2 billion calculation, the global web site of the first assumption is 5 billion.
Search 1 times per person per day (that is, 1 keywords, assuming that they are not repeated)
The search engine then searches 5 billion pages a day for a comparison of 2 billion keywords.
Uh, the This sounds scary, can you imagine? Imagine this data is so large, but the search engine every time the normal search time is less than a second. Indeed, in this process, according to our traditional Full-text search method, is not realistic. Take a closer look at the image below and note the words "Index library query."
Before explaining what the index Library and index library are doing in search engines, we also cite an image example:
We are reading, the teacher in the course of lectures, often said, please students to turn to page, see the next paragraph, remember it? Happy and helpless campus life is vivid ~_~, to the end. When the teacher sends you to the first page to see the first few paragraphs of this instruction, is an index running, where the index is the first few pages and paragraphs, with these two indexes, even if your book thickness of 1000 pages, you can also in a short period of time to locate the specific section of the words.
And the search engine's own index library is composed of many words, there are about 12W Chinese characters, the words composed of these characters will be nearly 10W, and then English, English 26 letters, the composition of the word for the moment counted as 100W bar, in the index library elements of the sorting method before we carry out the analysis of this data:
Chinese: 5 billion ÷10w=5w
English: 5 billion ÷100w=5000
Search engine processing 5W or 5,000 records, is very easy to do one thing.
Understand the importance of the index library, and then analyze the form of the index library:
In the search engine view, again gorgeous website, is also a pile of code piled up, take the text of the code to see:
After the search engine analysis, the removal of HTML code, leaving the words,
Then these words will go into the search engine's index library, and these enter the index library after each word has many sites, like the Xinhua Dictionary Directory Index page, the number of strokes is 10, through the index quickly found that the number of strokes is 20, can also be quickly traced through the index.
Search engine is through the establishment of such an index library, in order to search a keyword when the user quickly make a return page query. (and as for the rankings before and after, we do not say more in this article)
Again to simply tell the word, the above mentioned how many words, this is participle, but these are judged by our eyes, the search engine is how to make participle? Search engine is more powerful, but also only procedures, Google's Chinese word segmentation technology is to buy third-party companies, and Baidu's word segmentation technology is a self-created, We can understand that Baidu in advance to record a few w words, it may be through a certain arrangement of Chinese characters to form a free combination, this is not our concern and can be studied, we have to understand only the concept of participle.
Understand the concept of participle, when we do SEO, we must also through the search engine angle, so that they see the bottom of the page from the surface of the capture principle.