In the last blog post "web version RSS reader (iii)--parsing online RSS subscriptions," has been mentioned in the problem, here in detail.
When parsing a subscription in an RSS format, the main problem encountered is that the "Server returned HTTP response code:403 for url:http://xxxxxx" error, Baidu will know, This is a common error in Web site access, the server understands the customer's request, but refuses to process it. That is, Access denied! Then check the data, that some servers (such as CSDN blog) to deny Java as a client access to it, so when parsing, will throw an exception.
Do not allow access to do, do not be afraid, we have policies, under the countermeasures. The server is accessed by setting up User-agent to spoof the server.
Connection.setrequestproperty ("User-agent", "mozilla/4.0" (compatible; MSIE 5.0; Windows NT; Digext) "); Use UA camouflage to access connection objects
But after a long time, found that only modify Rsslib4j.jar to the connection object to set UA. Have to find the source code modified, n long after, in Google code to hunt an open source project newrsslib4j, it is based on the RSSLIB4J modified, the project Open Source homepage: http://code.google.com/p/newrsslib4j/. Downloaded with joy, it turns out that there are still 403 of problems. A hard-hearted, oneself to do a rsslib, and then checkout the source of newrsslib4j, their own hands to change.
1. Modify the 403 Forbidden problem.
Modify the Setxmlresource () method of the Rssparser class in the Org.gnu.stealthp.rsslib package, and add UA to the URLConnection object.
/ **
* Set rss resource by URL
* @param ur the remote url
* @throws RSSException
* /
public void setXmlResource (URL ur) throws RSSException {
try {
URLConnection con = u.openConnection ();
// -----------------------------
// Add time: 2013-08-14 21:00:17
// Person: @ 龙 轩
// Blog: http://blog.csdn.net/xiaoxian8023
// Add content: Since the server blocks java as the client to access rss, set User-Agent
con.setRequestProperty ("User-Agent", "Mozilla / 4.0 (compatible; MSIE 5.0; Windows NT; DigExt)");
// -----------------------------
con.setReadTimeout (10000);
String charset = Charset.guess (ur);
is = new InputSource (new UnicodeReader (con.getInputStream (), charset));
if (con.getContentLength () == -1 && is == null) {
this.fixZeroLength ();
}
} catch (IOException e) {
throw new RSSException ("RSSParser :: setXmlResource fails:" + e.getMessage ());
}
}
Modify the Guess () method of the CharSet class in the Org.mozilla.intl.chardet package, comment out the original InputStream object, create the URLConnection, set the User-agent, Create InputStream by URLConnection object:
Judge from URL public
static String guess (URL url) throws IOException {/
/-----------------------------
/ /modified: 2013-08-14 21:00:17
//Staff: @ Longxuan
//Blog: http://blog.csdn.net/xiaoxian8023
//modify content: Comment InputStream, Create URLConnection, set user-agent, create InputStream
//inputstream in = Url.openstream () by URLConnection object;
URLConnection con = url.openconnection ();
Con.setrequestproperty ("User-agent", "mozilla/4.0" (compatible; MSIE 5.0; Windows NT; Digext) ");
InputStream in = Con.getinputstream ();
-----------------------------return
guess (in);
}
More Wonderful content: http://www.bianceng.cn/Programming/Java/