When parsing the webpage today, I encountered a problem that I had to log on before I could access it. I searched for some information on the Internet. Someone made it, but it was implemented using HttpClient + Jsoup, I don't know what version of Jsoup they are using. Address: zookeeper
HttpClient simulates login to renrenren and crawls the log Content (1), http://bbs.csdn.net/topics/390269063. Now the jsoup API can directly simulate login and obtain the information returned by the server.
Here, I use the Shui Mu community for Demo. The following IDs and passwd are the names of the input usernames and passwords in the form submission respectively.
Map
map = new HashMap
();map.put("id", "****");map.put("passwd", "****");Response response = Jsoup.connect("http://m.newsmth.net/user/login").data(map).method(Method.POST).timeout(20000).execute();if (response.statusCode() == 200) {SmthApp.getInstance().setCookies(response.cookies());}
The Response contains the Cookie we need. The method for obtaining the Cookie is response. the returned type of cookies () is Map. The following is the Cookie content returned by the browser logon. The Cookies we obtain are also the same.
Set-Cookie:main[UTMPUSERID]=***; path=/; domain=.newsmth.netSet-Cookie:main[UTMPKEY]=97311264; path=/; domain=.newsmth.netSet-Cookie:main[UTMPUSERID]=guest; path=/; domain=.newsmth.netSet-Cookie:main[PASSWORD]=%2501g2VSVO%257D%2507%251DW%253B%2524K%2B%251C%2500a%2502%2501%257DF%2505X; path=/; domain=.newsmth.netSet-Cookie:main[UTMPNUM]=9967; path=/; domain=.newsmth.netSet-Cookie:main[UTMPKEY]=68252570; path=/; domain=.newsmth.netSet-Cookie:main[UTMPNUM]=37535; path=/; domain=.newsmth.net
When parsing the page to be logged on,
Document document = Jsoup. connect (url). timeout (20000). cookies (SmthApp. getInstance (). getCookies () // This is the cookies obtained above. get ();
In this way, you can simulate login to the Resolution Page. Note that there is a time limit for this. When it becomes invalid, you will request to obtain the latest Cookie again.