In the previous article (search engine work principle refraction SEO knowledge) has said the search engine work principle first part also is how to crawl the network information through the spider spider the question. From this we also learned that some of the spider's habits and seo some of the operational tips. In today's article we will see more about the search engine content, good nonsense will not say more.
We all know about spiders. After all, it is only a program, his work does not through the site's front desk to analyze a site content, but through the site's code to crawl information. And in the source code of the website we will see a lot of HTML, JS and some other program statements. The spider is only interested in the article, which means he only extracts some text from the page. Some friends might say, what else do we write? Is the code not going to work?
In fact, it is not so, in the site's label optimization We all know such as H tags, nofollow tags, alt tags and so on. These tags can also be emphasized and decorated when spiders crawl our website information. For example, when you encounter a picture, but the spider does not recognize the information in the picture, then we will consider to set the ALT tag to help the search engine to identify the information in the picture, in order to make some of the site's weight is not dispersed, we have to add a nofollow in the necessary places.
Since the search engine spider is very interested in the text of the website, that is for Chinese SEO optimization. Is there a conceptual thing in here, that is "participle"
The simplest example, such as our Chinese "website optimization" of these four words, in the Baidu search engine data thesaurus, is actually the site and optimize two different words to store. When the user to search the word site optimization, the search engine's step is to the site of the Word library in the information and optimize the word library to do a intersection to retrieve and rank. There is a point to be said later.
When it comes to participle we have to mention a thing, that is how we look at a site segmentation situation: Baidu search engine inside Search "Ningbo He Tao seo" in front of the search results, we look at the site of the snapshot as shown
We are not very easy to see in the snapshot of the results of the display Baidu has divided the word into three different background color phrases. This is just one, we can also in Baidu's search results see as long as we search the word, there will be marked red situation. This is another form of participle.
Some friends may say, you say those are individual cases, we in the actual process to search the word is far more complex than this, and there may be some modal particles and so on. As a more and more perfect search engine. They have already considered this problem, first of all because these modal words do not actually play any role in the search process. When the search engine is preprocessed, they also filter the words. On the one hand is to reduce the search burden, on the other hand to increase the accuracy of content.
There is a degree of necessity before a search engine can file a spider's crawling information, and he has to do a repetitive review of the content. The search engine must delete the data from the same website. There is a situation: for example, someone in search of my website ningbo seo, in the results of the first page will appear in our home page and the possibility of content pages. In fact, as a mature search engine, this situation is to be avoided. Because such content is of little use to the user, it is equivalent to the same content to be ranked two times. The second weight: for different websites, because the content on the network that is thousands of. There will be two different sites, but the same content. We often say that the content of the site reproduced the problem, search engines will also take into account the duplication of information to delete the election.
With a few steps of the heavy audit, the next is to do an effective data collation. I made a list of two tables for everyone to see:
Forward Index
Let me briefly explain the meaning of the table above: forward-indexed tables, which are not directly used by search engines to rank data tables. This is where he assigns each keyword according to the file. This means that the primary key is a file. We switched to the second table, the search engine has the keyword as a primary key, it is also with us to search for a keyword to find the information we want. We can find that: when the user search keyword 7, this time the search engine is not necessary for each content again to retrieve, it just do is to extract from the Word Library of keywords file 1, file 2, file 8 and so on.
At least these files are how to rank, this is the next time I will write out, thank you for taking the time to see my article collation.
Article excerpt from He Tao SEO Blog: http://www.nb-seoer.com/post/154.html