Intermediary transaction http://www.aliyun.com/zixun/aggregation/6858.html ">seo diagnose Taobao guest cloud host technology Hall
Search engine does not really search the Internet, it is actually a predefined database of indexed Web pages.
The real search engine, usually refers to the collection of tens of millions of to billions of web pages on the Internet and every word in the Web page (that is, keywords) index, build index database Full-text search engine. When a user looks up a keyword, all pages that contain the keyword in the content of the page will be searched as search results. After the complex algorithm is sorted, these results will be ranked according to the correlation degree of the search keywords.
Now the search engine has generally used the hyper-chain analysis technology, in addition to analyzing the content of the index page itself, also analyzes the index of all links to the page URL, Anchortext, and even the surrounding text. So, sometimes, even if a page a does not have a word like "Demon Satan", but if there is another page B with the link "Demon Satan" point to the page A, then users search "Devil Satan" can also find page A. And, if there are more pages (C, D, E, F ...). Use the link named Demon Satan to point to this page a, or give the source page of the link (B, C, D, E, F ...). The better, then page A will be considered more relevant when users search for "Demon Satan", and the ranking will be more forward.
The principle of the search engine can be seen as three steps: Crawl Web pages from the Internet → build Index database → Search for sort in index database.
Crawl Web pages from the Internet
Use the spider System program, which automatically collects Web pages from the Internet, automatically access the Internet and crawl through all the URLs in any Web page, repeat the process, and collect all the pages crawled.
Setting up an index database
Analysis of the collected Web pages by analyzing the index system program to extract the relevant page information (including the URL of the page, the type of the code, the keywords contained in the page content, the location of the keyword, the time, the size, the link to other pages, etc.), and according to a certain correlation algorithm, Get the relevance (or importance) of each keyword in the page content and the hyperlink, and then use the relevant information to build the index database.
Search for sort in index database
When the user enters a keyword search, the search system program from the Web page index database to find all the relevant pages. Because all relevant web pages for the relevance of the keyword has already been good, so just according to the availability of the relevance of the ranking, the higher the correlation, the ranking the more forward.
Finally, the page generation system organizes the link address of the search result and the content summary of the page to return to the user.
Spider of search engines generally have to visit all pages regularly (the cycle of each search engine is different, may be days, weeks or months, may also be different importance of the page has different frequency update, update the index database to reflect the content of the Web page updates, add new Web information, remove dead links, and reorder them according to the content of the page and the changes in the link relationship. In this way, the contents and changes of the Web page are reflected in the results of the user's query.
Although the internet has only one, but each search engine's ability and the preference is different, therefore crawls the webpage each dissimilarity, the sorting algorithm also each is dissimilar. A large search engine's database stores hundreds of millions of to billions of of the Internet's index of web pages, with data reaching thousands of g or even tens of thousands of G. But even if the largest search engine to build more than 2 billion of the index database of Web pages, it can only account for less than 30% of ordinary Web pages, the overlap between different search engine data is generally below 70%. The important reason we use different search engines is because they can search for different content separately. And the internet has a lot of content, is the search engine can not crawl index, but also we can not use search engines to search.
You should have this concept in mind: Search engines can only search the Web page index database stored content. You should also have this concept: if the Search Engine page index database should have and you do not search out, it is your ability problem, learning search skills can greatly improve your search ability.