Currently, there are many anti-collection methods. First, we will introduce the common anti-collection methods and their drawbacks and collection countermeasures:
1. determine the number of times an IP address accesses the website page within a certain period of time. If the access speed is significantly higher than that of normal users, access from this IP address is denied.
Disadvantages:
1. This method is only applicable to dynamic pages, such as asp, jsp, php, etc. .. static pages cannot determine the number of times a certain IP Address has accessed this site.
2. This method will seriously affect the indexing of search engine spider because the indexing speed of search engine spider is faster and multithreading. This method also rejects the search engine spider from indexing site files.
Collection countermeasure: The collection speed can only be slowed down or not collected
Suggestion: Create an IP address library for search engine spider, and only allow search engine spider to quickly browse site content. It is not easy to collect the IP address library of search engine spider. A search engine spider may not have only one fixed IP address.
Comment: This method is effective for data collection prevention, but may affect the search engine's indexing.
2. Use javascript to encrypt the content page
Disadvantage: This method is applicable to static pages, but it seriously affects the search engine's indexing status. The content received by the search engine is also encrypted.
Collection countermeasure: It is recommended that you do not collect it. If you need to collect it, you can also collect the JS script for password cracking.
Suggestion: no improvement suggestions currently
Comment: We recommend that you do not use this method for webmasters with search engine traffic.
3. Replace the specific mark on the content page with "specific mark + hidden copyright text"
Disadvantages: This method has minor drawbacks. It only increases the page file size a little bit, but is easy to perform reverse collection.
Collection countermeasure: replace or replace the collected copyrighted text containing the hidden copyrighted text with your own copyright.
Suggestion: no improvement suggestions currently
Comments: I feel that it is of little practical value. Even adding random hidden texts is also a perfect addition.
4. Users can only browse after login
Disadvantages: This method seriously affects search engine spider's indexing
Collection countermeasure: at present, some people have published countermeasure articles. For specific countermeasure, see ASP thief program how to use XMLHTTP to submit forms and send cookies or sessions.
Suggestion: no improvement suggestions currently
Comment: We recommend that you do not use this method for webmasters with search engine traffic. However, this method is somewhat effective in preventing general collection programs.
5. Paging using javascript and vbscript scripts
Disadvantages: Search Engine indexing is affected.
Collection countermeasure: Analyze javascript and vbscript scripts, find out their paging rules, and create a page set for this site.
Suggestion: no improvement suggestions currently
Comment: anyone who understands the scripting language can find out its paging rules.
6. You can only view the data through the link on this site, for example, Request. ServerVariables ("HTTP_REFERER ")
Disadvantages: Search Engine indexing is affected.
Collection countermeasure: I don't know if I can simulate the webpage source .... Currently, I have no collection countermeasures for this method.
Suggestion: no improvement suggestions currently
Comment: We recommend that you do not use this method for webmasters with search engine traffic. However, this method is somewhat effective in preventing general collection programs.
As can be seen from the above, the commonly used anti-collection methods will either have a great impact on Search Engine indexing, or the Anti-collection effect is not good, and the anti-collection effect is not effective. Is there any effective method to prevent collection without affecting search engine indexing? So proceed!
As you can see from the collection principles mentioned above, most collection programs collect data by analyzing rules, such as analyzing paging file name rules and page Code rules.
I. Countermeasures against collection of paging file name rules
Most collectors Perform Batch and multi-page collection by analyzing paging file name rules. If others cannot find the file name rules for your paging files, they will not be able to perform batch multi-page collection on your website.
Implementation Method:
I think using MD5 to encrypt the paging file name is a good method. Here, someone will say that you use MD5 to encrypt the paging file name, based on this rule, others can simulate your encryption rules to get your paging file name.
I want to point out that when we encrypt the paging file name, we should not only encrypt the changed part of the file name.
If I represents the page number, do not encrypt it like this: page_name = Md5 (I, 16) & ". htm"
It is best to add one or more characters to the page number to be encrypted, such as page_name = Md5 (I & "any one or several letters", 16) & ". htm"
Because MD5 cannot be decrypted, and other people will see the results of MD5 encryption, they will not be able to know what the letter you followed up after I, unless he uses brute force *** MD5, it is not realistic.
Ii. Anti-collection countermeasures for page Code rules
If there are no code rules on our content page, no one else can extract the content they need from your code. Therefore, to prevent data collection in this step, we need to make the Code have no rules.
Implementation Method:
Randomize tags to be extracted by the recipient
1. Customize multiple webpage templates with different important HTML tags in each webpage template. When the page content is displayed, the webpage template is randomly selected. Some pages are displayed in CSS + DIV layout, some pages use table layout. This method is troublesome. There are several more template pages for a content page. However, anti-collection is a very cumbersome task, it is worthwhile for many people to prevent data collection.
2. If the above method is too troublesome, you can also randomize the important HTML mark on the webpage.
The more web templates the more html code is, the more random the html code is. When the other party analyzes the Content Code, the more difficult it is. When the other party specifically writes a collection policy for your website, at this time, the vast majority of people will know how to leave, because this person is just lazy and will collect others' website data ~~~ At present, most people collect data using collection programs developed by others. After all, a few people develop collection programs to collect data.
Some simple ideas are provided to you:
1. display the content that is important to data collectors but not to search engines using client scripts
2. Dividing a page of data into N pages is also a method to increase the difficulty of data collection.
3. Use a deeper connection, because most of the collection programs can only collect the first three layers of website content. if the content is located in a deeper connection layer, you can avoid being collected. However, this may cause inconvenience to the customer. For example:
Most websites use the homepage-content index page-Content Page
If changed:
Homepage ---- content index page ---- content page entry ---- Content Page
Note: It is best to add the code for automatically transferring in content pages to the content page entry.
<Meta http-equiv = "refresh" content = "6; url = content page (http://www.oureve.net)">
As a matter of fact, as long as the first step of anti-collection is done (encryption paging file name rules), the Anti-collection effect is already good. We recommend that you use both anti-collection methods to increase the collection difficulty for the collectors, this makes it difficult for them to return to the website.