Want to do a group Bo (itself is some, because the CSDN RSS subscription does not conform to the specification can not be crawled, I intend to manually implement the crawl operation), but through the httpclient Web source access to the time unexpectedly found that the return is 403 forbidden, a little awkward. Then find the information on the Internet after finding that it is to set the request parameters, and then think about whether HttpClient is not what Setparameter method, looked for a bit as sure as it is, and then set the following parameters:
HttpClient HttpClient = new Defaulthttpclient ();
Httpclient.getparams (). Setparameter ("User-agent", "mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.11 (khtml, like Gecko) chrome/23.0.1271.64 safari/537.11 ");
However useless, since as a request has the request parameter, then should also have the request head, yes, finally found the solution 403 forbidden method, usually, through the browser to request a page, the browser will user-agent included in the request header, If you go directly through httpclient to access a webpage without setting the request header, it is possible for the server to reject the response, so as long as you set up the request header as follows, you can normally access:
HttpGet httpgets = new HttpGet (URL);
Httpgets.setheader ("User-agent", "mozilla/5.0" (Windows; U Windows NT 5.1; ZH-CN; rv:1.9.0.3) gecko/2008092417 firefox/3.0.3 ");
This allows you to simulate a browser to request a Web page.
+++++++++++++++++++++++++++++++++++++
Off Topic
Hope that through the blog to help yourself to record some things, to record some problems, perhaps the problem is very small, but when needed to see there will always be a great help.