In the formal study of SEO, you still need to learn how the search engine works, after all, SEO is for the search engine operation, then understand the working principle of the search engine, then encounter some problems, you can know the cause of the problem. A search engine, generally consists of the following modules:
1, Grab module
2. Filter module
3, included modules
4. Sorting module
Search engine in operation, the first job is to go to the Internet to crawl the page, and the implementation of the work of the module, we call the Capture module. To learn the capture module, we need to understand the following knowledge points first:
1, Search engine crawl program: Spider
Search engines in order to be able to automatically crawl the internet above tens of thousands of pages, must have a fully automated page crawler. And this program we generally call "spider" (also called "Robot"). So the different search engine spider, the name also is different. Baidu's crawl program, commonly known as Baidu Spider.
Google's crawl program, commonly known as Google robot.
360 of the crawler, generally called 360 spiders.
In fact, whether called a spider, or a robot, you just know this refers to the search engine crawler, it can be. The task of the spider is very simple, is to follow the link constantly crawling on the internet, they have not included in the pages and links, and then crawled to the Web page information and link information stored in their own web page database. And these crawl to the page, will have the opportunity to appear in the final search results.
2, how to let spiders crawl our website
Through the face of the explanation of spiders, we can know: to want their own page eventually appear in the search results, first of all to let spiders crawl to our site. Here are three ways to get spiders to crawl our site
External links: We can be in some search engines have been included in the site posted on their own site links to attract spiders, or exchange links is also a common method.
Submit Link: Baidu for webmaster to provide a link to submit the tool, through this tool, we only need to submit through this tool to Baidu, then Baidu will send spiders to crawl our web page.
Baidu site Submission Tool URL (as shown in the picture):
Http://zhanzhang.baidu.com/linksubmit/url
Spiders themselves to crawl: if you want spiders to be able to regularly take the initiative to crawl their own web site, then you must provide quality site content. Only spiders find the content of your site is very good, then the spider will take special care of your site, timing will come to your site to see if the new content is generated. How to ensure that your site can provide the advantages of content, this topic we later in the chapter to elaborate.
3, how to know that spiders have come to our site
The following 2 ways to know whether spiders have been to our website.
(1) Baidu Crawl frequency tool
The tool URL is: Http://zhanzhang.baidu.com/pressure/index
(2) Server IIS Log
If your server has IIS logging enabled, you can also see traces of spiders coming through IIS log files. Through the IIS log we can find Baidu spiders crawl our pages.
4, affecting the spider crawl factor
Well, we know that the site wants to have rankings, the first step is to be able to be crawled by spiders. So those factors may cause spiders not to crawl our web page properly, we should pay attention to the following points:
(1) URL can not be too long: Baidu suggested that the length of the URL should not exceed 256 bytes (an English letter (no case) to occupy a byte of space, a Chinese character for two bytes of space).
(2) Do not include the Web site in Chinese: Baidu for Chinese web site crawl effect are relatively poor, so in the Web site must not have Chinese.
(3) Server problem: If your server quality is too poor, always can not open, then will affect the spider's crawl effect.
(4) Robots.txt shielding: Some SEO personnel due to negligence. In the Robots.txt file inside the screen want to be Baidu crawl path or page. This will also affect the Baidu for our site's crawl effect.
(5) Avoid the characters that are difficult to parse by spiders, such as/abc/123456;;;; %b9&ce%edds$ghwf%.html This URL spiders can not understand will give up crawl.
(6) Note that dynamic parameters do not too much too complex, the current Baidu has a good dynamic URL has been handled, but too many parameters and complex URLs may be spiders think unimportant and abandoned. This is particularly important and must be noted.
Because the internet is flooded with a lot of junk pages and content-free pages, these pages are not needed for search engines or search users. So search engine to avoid these garbage pages occupy their valuable storage resources, so will crawl back to the content of the spider filter. The module that completes this function, we are called the filter module. So those factors will affect the filter module, there are 2 points below:
(1) Identify
Because the search engine spider is currently best at the analysis of text and links, the image and video recognition is more difficult. So if a page is mostly pictures and video, it's hard for search engines to identify the content of the page. For this kind of page, the search engine may be filtered out as a spam site. So, when we edit the content of the site, we should add some text description, so that it is not easy to filter out the filter module.
(2) Content quality
On the basis of the identifiable content, the search engine will also be able to capture the content of the Web page, compared with what has been saved to the database. If the search engine finds that the quality of your page content is duplicated in most of the content in the database, or if the quality is worse, the page will be filtered out.
Will pass the filter module "assessment" of the Web page, the word segmentation, data format standardization, and then stored in the index database of the program module, we call it a collection module. If your site is fortunate enough to be included in the module, then you have the opportunity to get ranked.
1, how to see whether a Web page is included
The simplest way is to copy the Web site to the Baidu search box to search, if the page can appear the search results, then that the site has been included.
2, how to view the amount of a website included
There are 2 methods:
(1) Site command
Through the "site: Domain name" command, we can see the search engine crawled a domain name under the page collection:
(2) Baidu "Index Quantity" query tool
Baidu officially provided by the "Index" query tool, you can also query the amount of our site.
What do you do when you collect less?
This is divided into two situations:
(1) New station
Generally speaking, the new station just on line, at least 1-2 months will begin to collect. Early in the general only included a home page. For this situation, there is no other way, because Baidu in order to prevent the proliferation of garbage stations, specifically the new station to lengthen the audit time. So, if you are operating a new station, then include less tension, as long as you honestly provide quality content, then 2 months after Baidu will begin to collect your inner pages.
(2) Old station
Some of the old station will be included in the amount of less, even when the amount began to decrease. It is generally due to the poor quality of the pages inside the site.
This time stationmaster should hurriedly adjust the content quality of the whole station, provide the quality content only then can ensure own website rank not to change.
To the page in the index database, the weight of each page is obtained by a series of algorithms, and the procedure of sorting it is called the sorting module.
If your page is sorted by a sorting module and ranked in front of a keyword, your page can be displayed in front of the user when the search user searches for the keyword. To get your website to get a good ranking, you need to do the following 2 points:
1, improve the basic optimization
Want to get a good ranking, then your Web page first to do a good job of basic optimization, which includes site positioning, site structure, site layout, site content and so on several parts. The content of these basic optimizations, which we will elaborate on later. It is only by perfecting the basic parts of the foundation that we have passed the exam.
2. Good comprehensive data
On the basis of good foundation, if your Baidu statistics background data performance, user loyalty and the effect of the promotion outside the station, will be in the pass line plus points. As long as your bonus points exceed all of your competitors, your site will be in front of all your opponents.
This article for you explained the search engine work principle, then grasps this principle to you to study SEO to have what to help?
Help is that when you encounter some SEO technical problems, you can find the reason through the working principle of the search engine.
For example, you are a new station, did 1 months found only included in the home page. This time you can know that is because the collection module for the new station has a check period, so this is normal phenomenon.
There are or you found that your site's article included normal, but no rankings, this time you will know that your article although included in the module, but because the basic optimization and comprehensive data is not good enough, so the sorting module does not give a good ranking. So you can know that the next step is to improve the quality of the site's content.
So, master the working principle of search engines, for us to learn SEO is essential.