A problem related to incomplete information collected

Source: Internet
Author: User
Ask for a question about incomplete information collected. I want to collect the website token. (When I got it from other places and opened it with IE, I found that I had loaded half of it and asked for a problem of incomplete information collection.
I want to collect this website
Http://www.tvmao.com/drama/MGxYWA==/episode/0

At the beginning, all information was obtained,

When the data is collected for a certain period of time, only half of the collected information is obtained, and some text is missing.

(When I got it from other places and opened it with IE, I found that I first loaded half of the text. in a short time, I loaded half of the text)
(Open in a local browser with only half of the text)
What can I do. To obtain all information.
















------ Solution --------------------
It is possible that this website has been protected against Collection. if the same IP address is too frequently accessed, anti-collection will be enabled for this IP address. this is also in line with what you said, you can complete collection at the beginning, it takes a long time. But this is okay. some websites are abnormal and output at 1 KB intervals each time.
------ Solution --------------------
Discussion

In this case, how can I prevent data from being collected?
Reference:

It is possible that this website has been protected against Collection. if the same IP address is too frequently accessed, anti-collection will be enabled for this IP address. this is also in line with what you said, you can complete collection at the beginning, it takes a long time. But this is okay. some websites are abnormal and output at 1 KB intervals each time.

------ Solution --------------------
Prevent collection:
1: users can log on to access website content
2: using the script language for paging (hiding pages)
3: Anti-Leech method (you can only view the anti-Leech method through the link on this site, for example: Request. ServerVariables ("HTTP_REFERER")
4: The website content is displayed in full flash, images, or pdf.
5: websites randomly accept different templates
6: accept dynamic and irregular html tags
Once you need to search engine crawlers and collectors at the same time, this is very frustrating, because the first step of the search engine is to collect the content of the target webpage, which is the same as the principle of the collector, so many methods to prevent the collection also impede the search engine's indexing of websites. why? Although the above 10 suggestions cannot completely prevent data collection, a majority of collectors have been rejected when they are applied together.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.