Web Crawler and search engine optimization (SEO), crawler seo
Post reprinted: Http://www.cnblogs.com/nanshanlaoyao/p/6402721.htmlcrawling
A crawler has many names, such as web Robots and spider. It is a software program that can automatically process a series of web transactions without human intervention. Web crawlers are robots that recursively traverse various information web sites, obtain the first web page, then retrieve all the web pages that the page points to, and so on. Internet search engines use crawlers to wander on the web and pull all the documents they encounter back. Then process these documents to form a searchable database. To put it simply, web crawler is a content collection tool that allows search engines to access your website and then include your website. For example, Baidu's web crawler is called baidusp.
- How search engine crawlers work
Web <---> crawler <---> Web content library <---> Index Program <---> index database <---> Search Engine <---> User
Notes for Crawlers
Link extraction and standardization of relative links
When a crawler moves on the web, it constantly parses HTML pages. It needs to analyze the URL links on each parsed page, add these links to the page list to be crawled. For details about the solution, refer to this article.
- Avoid Loop
When web crawlers crawl on the web, be careful not to get stuck in a loop. There are at least three reasons: the loop is harmful to crawlers.
Search Engine Optimization
Search engine optimization is also called SEO. After learning about the working methods and principles of web crawlers, you will have a better understanding of SEO. for front-end development, you need to pay attention to the following SEO content:
- Highlight important content
Reasonable title, description, and keywords
Although the weights of these three items are gradually reduced, I still hope to write them properly and write only useful things. I don't want to write novels here, but I want to express my focus.
Title: only emphasize the key, important keywords should not appear more than 2 times, and should be on the top, each page title should be different description: The content of the webpage should be summarized here, the length should be reasonable, keywords cannot be overly stacked, and the description on each page must be different. keywords: Just list several important keywords, and you cannot overpile them up.
- Semantic-based HTML code writing, in line with W3C standards
For search engines, the most direct aspect is the webpage HTML code. If the code is semantically written, the search engine can easily understand the meaning of the webpage.
- Use layout to put important HTML code at the beginning
The search engine crawls HTML content from top to bottom. With this feature, the main code can be read first, allowing crawlers to crawl the content first.
- Important content should not be output using JS
Crawlers do not read content in Javascript, so important content must be placed in HTML.
- Use the iframe framework as few as possible
The search engine does not capture the content in iframe. Important content should not be placed in the framework.
- Add the alt attribute to the image
The role of the alt attribute is to display images in text instead of text when they cannot be displayed. For SEO, it gives search engines the opportunity to index images on your website.
- The title attribute can be added to the content to be emphasized.
During SEO optimization, it is suitable to set the alt attribute to the original meaning of the image, while the ttitle attribute is set to the element that sets this attribute to provide the information of the Creation.
- Add length and width to the image
The image size will be at the top.
- Retain text effects
If you need to take into account the user experience and SEO effect, you can use style control to prevent text from appearing in the browser where you must use images, such as the title of a personalized font, however, this title is included in the webpage code.
Note: Do not use the display: none; Method to hide the text, because the search engine will filter out the display: none; content, it will not be searched by the spider.
- Increase website speed
Website speed is an important indicator of Search Engine sorting.
- Use the rel = "nofollow" attribute to tell crawlers not to crawl other pages for links to external websites.