Web Crawler and search engine optimization (SEO), crawler seo

Source: Internet
Author: User

Web Crawler and search engine optimization (SEO), crawler seo
Post reprinted: Http://www.cnblogs.com/nanshanlaoyao/p/6402721.htmlcrawling

A crawler has many names, such as web Robots and spider. It is a software program that can automatically process a series of web transactions without human intervention. Web crawlers are robots that recursively traverse various information web sites, obtain the first web page, then retrieve all the web pages that the page points to, and so on. Internet search engines use crawlers to wander on the web and pull all the documents they encounter back. Then process these documents to form a searchable database. To put it simply, web crawler is a content collection tool that allows search engines to access your website and then include your website. For example, Baidu's web crawler is called baidusp.

  • How search engine crawlers work
    Web <---> crawler <---> Web content library <---> Index Program <---> index database <---> Search Engine <---> User
Notes for Crawlers
  • Link extraction and standardization of relative links

    When a crawler moves on the web, it constantly parses HTML pages. It needs to analyze the URL links on each parsed page, add these links to the page list to be crawled. For details about the solution, refer to this article.

  • Avoid Loop
    When web crawlers crawl on the web, be careful not to get stuck in a loop. There are at least three reasons: the loop is harmful to crawlers.
  • Mark as not crawling
    Only the specified content is included in the specified search engine. Search for the robot.txt file of the first website in the engine crawler. You can also add the linkRel = "nofollow"Mark.

  • Loop and loop avoidance Solution
  • Normalized URL
  • Crawling with breadth-first priority
    Access in the breadth-first mode can minimize the impact of loops.
  • Throttling
    Limit the number of pages that crawlers can obtain from a web site within a period of time. You can also limit the total number of duplicate pages and the total number of visits to the server through throttling.
  • Restrict the URL size
    If the loop increases the URL length, the loop is terminated.
  • URL blacklist
  • Manual monitoring

Search Engine Optimization

Search engine optimization is also called SEO. After learning about the working methods and principles of web crawlers, you will have a better understanding of SEO. for front-end development, you need to pay attention to the following SEO content:

  • Highlight important content
    Reasonable title, description, and keywords
    Although the weights of these three items are gradually reduced, I still hope to write them properly and write only useful things. I don't want to write novels here, but I want to express my focus.
    Title: only emphasize the key, important keywords should not appear more than 2 times, and should be on the top, each page title should be different description: The content of the webpage should be summarized here, the length should be reasonable, keywords cannot be overly stacked, and the description on each page must be different. keywords: Just list several important keywords, and you cannot overpile them up.
  • Semantic-based HTML code writing, in line with W3C standards
    For search engines, the most direct aspect is the webpage HTML code. If the code is semantically written, the search engine can easily understand the meaning of the webpage.
  • Use layout to put important HTML code at the beginning
    The search engine crawls HTML content from top to bottom. With this feature, the main code can be read first, allowing crawlers to crawl the content first.
  • Important content should not be output using JS
    Crawlers do not read content in Javascript, so important content must be placed in HTML.
  • Use the iframe framework as few as possible
    The search engine does not capture the content in iframe. Important content should not be placed in the framework.
  • Add the alt attribute to the image
    The role of the alt attribute is to display images in text instead of text when they cannot be displayed. For SEO, it gives search engines the opportunity to index images on your website.
  • The title attribute can be added to the content to be emphasized.
    During SEO optimization, it is suitable to set the alt attribute to the original meaning of the image, while the ttitle attribute is set to the element that sets this attribute to provide the information of the Creation.
  • Add length and width to the image
    The image size will be at the top.
  • Retain text effects
    If you need to take into account the user experience and SEO effect, you can use style control to prevent text from appearing in the browser where you must use images, such as the title of a personalized font, however, this title is included in the webpage code.
    Note: Do not use the display: none; Method to hide the text, because the search engine will filter out the display: none; content, it will not be searched by the spider.
  • Increase website speed
    Website speed is an important indicator of Search Engine sorting.
  • Use the rel = "nofollow" attribute to tell crawlers not to crawl other pages for links to external websites.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.