Not conducive to spiders crawling web pages-Spider traps

Source: Internet
Author: User
Tags html page session id

Hello everyone, I am the first time in this article, if there is a bad place please master a lot of advice.

1, search engine can find web pages.

1 to search engine found the home page, you must have a good external link links to the home page, it found the home page, and then the spider will crawl along the link deeper.

Let the spider through the simple HTML page link arrives, the JavaScript link, the flash link is the spider's Trap. This should be noted.

2 Find the page can not grasp the content.

Spiders are found can be crawled, the database dynamic generation, with a lot of parameters URL, SessionID, the entire page is flash, frame structure, a large number of turn, and a large number of copies of the contents of the spider may be intercepted outside the door. It's also something to be aware of.

2, Flash

1 in a certain part of the page using Flash to enhance the visual effect is normal, such as many flash ads, icons and so on. But this is part of an HTML page. It won't have much effect.

2 But some web site is a very large flash file, which constitutes a spider trap, this time the spider crawled only a flash link, no other content, so try to avoid this point.

3, SessionID

1 Some websites use SessionID (session ID) to track users ' access, the user's not once access will generate a separate ID, and then added to the URL, this is the spider every time crawling the site will be spider as a new user, causing spiders can not crawl, which is a big trap spider.

2 It is generally recommended that you follow a user's visit by using cookies instead of surviving SessionID.

4, a variety of jumps

1 In addition to the familiar 301 turn, the other turn to search spiders are more sensitive, such as 302 temporary turn, JavaScript turn, flash turn, meta refresh jump, so we suggest that we do not do other unfavorable to the page, 301 also includes, Do not use 301 to turn when you have no last resort. This is a suggestion.

5. Frame structure

1 If you do not know the frame structure, you can omit this step, because you have avoided this spider trap.

2 Use the Framework design page, in the early days, but now the site is rarely used frame design, so here is not much to say, whether you are using or useless, remember a word: do not let search engine to please you. Forget about the frame.

6, dynamic URL

1 dynamic URL refers to the database-driven Web site generated, with? = number of such parameters are, in general, to avoid this dynamic parameter URL, because this is not conducive to spiders crawling.

7. JavaScript links

1 Now there are a lot of websites like to use Java Script to generate navigation system, this is a very serious spider trap, it is equal to the spider has not started crawling when you have closed the door. So try to avoid

2java links in the SEO also has certain uses, the stationmaster may let some not participate in the rank the webpage and the duplicate content page, may use the Java link to prevent the spider to crawl.

8. Login Required

1 Some of the content of the site is required to login to see the member area, because this part of the spider can not climb, because spiders will not register, will not login, and will not enter account password. So you have to modify it.

9. Compulsory use of cookies

1 Some Web sites to achieve a certain function, such as remembering user information, tracking user access path, and so on. Forcing users to use cookies, if the user does not use cookies, the page will not be normal. Therefore, the forced use of cookies can only cause spiders do not normally access.

OK, thank you everyone can read my article, although not very good, but also my little experience, I hope to be able to put on the home page for more people to refer to the younger brother thank you.

Technical Exchange 83884473



Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.