Super Schedule Principle Analysis (how to get web content)

Source: Internet
Author: User

have been on the phone how to access the webpage is curious, such as the name of the page on the password and so on, how to fill in the phone, how to put the content of the page on the mobile phone. This time on the mobile phone access to the educational system to give a simple example, I believe you will understand after reading, you can freely parse the page.

June wants good things, its prerequisite. The first thing to do is to prepare two tools: 1. HttpWatch (Web page data analysis tool that uses it to crawl Web content), 2. A jar package: Jsoup (parsing Web content). With these two things the rest of the matter will be done.

I am a student of Dalian Maritime University using the education system of maritime University as an example to achieve a simple curriculum application.

First:

Read the course information, of course, the interface is very low, I did not optimize ... The following steps are implemented as a step-by-step.

First, analyze the webpage data with HttpWatch:

(This part of the master directly skipped, this is for the novice to see, the steps are very simple) this is the Input Account Password page, below is HttpWatch, you can see there is a red record (record) at this point, and so enter the account password then point it. Such as:

When the record is clicked, it becomes gray, then click Login to enter the educational system.

After entering the interface:

After loading the interface, click Stop, you can see a lot of Post,get requests, URLs and so on. In the URL This column is the URL we visit, which is used in the code. And we need the first POST request, select it, you can see the details below it:

There are headers headers, cookies, POST data, and so on, which will be used behind them. Let's take a look at each other, first headers header information:

You can see that there is a lot of information, using the Cookie,cookie is the equivalent of your browsing the Web ID card, that is, you in the next page operation, the cookie can prove that the object of operation is you and not others.  The rest of the parameters, if you are interested in the words can be delve into, and we now do not use. See PostData again:

can see here is just fill in the account password information, password is I painted red (this is not to see, hey). Finally, look at the cookie, which will be used in the program:

And there's a very important content:

This is the original request for the content of the Web page. Can see me this page has two frames, points up and down, the top is 48 pixels, the rest of the page is occupied by the bottom, of course, each school is not the same.

Here, you should have a general understanding of how to use the mobile phone to access the Web page, fill in the account password is post to the specified URL, will open the next page. The following steps to achieve the first step into the educational system, I first use Java experiment, because the implementation of Java simple, directly with the Android also real machine debugging, a little trouble, but the principle is the same, take the data is the same, understand that only need to transplant the code into the Android project on the line, Below the code:

Second, to enter the educational system with Java:

Only on the key code, the code is much more useless.

The URL of the post is private static String PATH = "Http://202.118.88.140/loginAction.do";
static list<string> cookies; Save the acquired cookie

  This is the account password, which is the content of the POST request body.

1 New Hashmap<string, string>(); 2         Params.put ("MM", "* * * * *"); 3         Params.put ("Zjh", "2220133697");
HttpURLConnection connection = (httpurlconnection) url.openconnection (); Connection.setconnecttimeout (3000); Connection.setdoinput (TRUE);//means getting data from the server Connection.setdooutput (TRUE);//means writing data  to the server Connection.setrequestmethod ("POST");//post Request method

  

Connection Successful
int responsecode = Connection.getresponsecode (); if (Responsecode = = HTTPURLCONNECTION.HTTP_OK) {//Get the returned cookie cookies = Connection.getheaderfields (). Get (" Set-cookie ");
System.out.println ("Cookie:" +cookies); return Changeinputestream (Connection.getinputstream (), encode);// Convert input flow to string }

  

The content returned is the source code of the webpage cookie:[jsessionid=bedusety1xwvs6epltg-u; path=/]result->

The first step here is to succeed, in this step is crucial to save the cookie, otherwise can not access. Take the second step below and go to the course Selection page to view the curriculum.

Third, access to curriculum information

Click to select the course management, click the semester schedule, of course, each school's website is not the same, click Finish can see HttpWatch on a lot of content, is to open a new page, you have to send a request, you can see the course, semester schedule, just as I point. The next step is the same as the top, you can skip directly.

With the first GET request selected, you'll see its details below:

You can see that the content shows the schedule for this semester, and the URL behind the get is the requested URL. Access it to get the content of the Web page, in the timetable analysis out.

public static string Sendgetxueqikebiao (String encode) {//encode is a Web page encoding that you can see on a Web page. InputStream inputstream = null; String url_path= "http://202.118.88.140/xskbAction.do?actionType=1";//This is the urltry {URL url = new URL (url_path); if (URL ! = null) {try {httpurlconnection httpurlconnection = (httpurlconnection) url.openconnection ();//    Time-out httpurlconnection.setconnecttimeout (3000);//indicates that the HTTP request is set to use the Get mode Httpurlconnection.setrequestmethod ("get"); for (String cookie:cookies) {httpurlconnection.addrequestproperty ("cookie", Cookie.split (";", 2) [0]);//    Adds a cookie to the request attribute. }int responsecode = Httpurlconnection.getresponsecode (); if (responsecode = = HTTPURLCONNECTION.HTTP_OK) {InputStream = Httpurlconnection.getinputstream (); return Changeinputestream (Inputstream,encode);//Convert input flow to string}} catch ( IOException e) {//TODO auto-generated catch Blocke.printstacktrace ();}}} catch (Malformedurlexception e) {//TODO auto-generated catch Blocke.printstacktrace ();} Return "";}

  The code is simple, just a GET request, return the same is the Web content: Here I only paste a very small part, because the return is a bit more, you can open a Web page, and then view the source code, I believe you will be dizzy ...

<tr><th width= "5%" class= "sortable" > Course number </th><th width= "15%" class= "sortable" > Course name </th ><th width= "4%" class= "sortable" > class serial number </th><th width= "4%" class= "sortable" > Credits </th><th Width= "6%" class= "sortable" > Course Properties </th><th width= "6%" class= "sortable" > Exam type </th><th width= "8 % "class=" sortable "> Teacher </th><th width=" 6% "class=" sortable "> method of Reading </th><th width=" 6% "class=" Sortable "> Elective status </th><th width=" 8% "class=" sortable "> Weeks </th><th width=" 5% "class=" sortable " > Week </th><th width= "5%" class= "sortable" > Festival </th><th width= "6%" class= "sortable" > School building </ th><th width= "8%" class= "sortable" > Classrooms </th></tr>
This is a lesson information.

<tr class= "Odd" onmouseout= "this.classname= ' even ';" onmouseover= "this.classname= ' Evenfocus ';" >
<TD align= "center" rowspan= "2" >13013511</td>
<TD align= "center" rowspan= "2" > English (two outside) </td>
<TD align= "center" rowspan= "2" >02</td>
<TD align= "center" rowspan= "2" >4.0</td>

<TD align= "center" rowspan= "2" > Limited selection </td>

  You get the information about the timetable here, it's very simple, there is wood, then you can parse it out and you will succeed. Parse the page with the front of the Jsoup, I have a blog post is to explain how to parse the page: http://www.cnblogs.com/jycboy/p/jsoupdoc.html

On the code: Here I am based on my this page parsing, parsing Web pages only to resolve their own needs or is very simple can be based on the attributes of the label parsing, want to fully understand the Jsoup to parse the page or is difficult.

String nbsp = jsoup.parse (" "). Text (). toString ();//parsing   that is, the space in the Web page, there is garbled, here first save a bit. Document doc = null;try {inputstream in = new Bytearrayinputstream (html.getbytes ("gb2312"));d OC = Jsoup.parse (In, "gb2312 "," http://202.118.88.140/xskbAction.do?actionType=1 ");}  catch (Unsupportedencodingexception e) {//TODO auto-generated catch Blocke.printstacktrace ();} catch (IOException e) {// TODO auto-generated catch Blocke.printstacktrace ();} Elements tables = Doc.select ("Table[class=displaytag]"); Elements TRS = Tables.select ("tr"); Elements TDS = Trs.select ("td[rowspan=2]");//td[rowspan=2]for (Org.jsoup.nodes.Element table:tables) {//Curriculum string Text = Table.text ();//system.out.println ("Tabletext:" +text);} System.out.println ("..."), int line_shu = 0;for (org.jsoup.nodes.Element tr:trs) {//Line, the.. ............. A row in the table line_shu++;if (Line_shu >=3) {//skips the first two lines elements Tdss = Tr.select ("TD"); System.out.println ("Tdsssize:" +tdss.size ()); for (org.jsoup.nodes.Element Td:tdss) {//a lattice string in the table Text1 = Td.text ();//string Text2 = Text.replaceall (nbsp, ""); String Text2 = text1.replace (nbsp, ""); System.out.println ("Tdtext:" +text2); System.out.println (".........");}}

  This is the parsed content:

Analysis and design of trtext:1-2 08:00-09:35 information System _03 Cheil-Voux 3-19 weeks hundred Chuan Lou 302 (multimedia)? English (second outside) _02 1-18 weeks Inspirational Building 304 (multimedia)? Information System analysis and Design _03 Cheil-Voux? 3-19 single Week hundred Chuan Lou 302 (Multimedia) embedded system software design _01 Chen Junliang? 2-18 bi-weekly motivational building 301 (multimedia)?? Computer organization and structure _02 Hui? 1-18 303 (multimedia)??? Tdsssize:8tdtext:1-2 Festival 08:00-09:35..............tdtext: Information System analysis and design _03 Cheil-Voux  3-19 Weeks Hundred Chuan Building 302 (Multimedia)  ... Tdtext: English (second outside) _02 Shi Yongwen  1-18 weeks Motivational building 304 (Multimedia) ...  tdtext: Information system analysis and design., the research and designing of the systems. _03 Cheil-Voux  3-19 single week hundred Chuan Lou 302 (Multimedia) embedded system software Design _01 Chen Junliang  2-18 bi-weekly motivational building 301 (Multimedia)  

  You get the course information here, you can build an Android project, glue the code over, build your own interface, fit your data and interface, and your curriculum app is born. Of course, on Android can use Httpcient,httppost instead of httpurlconnection will be better, I will replace it.

If there is a need for Android source code, you can tell me, write to now many, there are homework ...

forwarding Please specify the source: http://www.cnblogs.com/jycboy/p/kcbyl.html

Super Schedule Principle Analysis (how to get web content)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.