The search function of the book was completed today. Relatively speaking, is still a bit complicated, because the book Search results page HTML is not so normative, parsing requires a lot of patience.
First of all, according to query conditions to obtain the results of HTML, query conditions can be many kinds of, here for practical, convenient, I specifically restricted the query condition is: keyword, East Campus, can borrow
Here's how to get the resulting HTML:
/** * Search for a book by keyword * * Search can be either without login or post-logon situations. Now it is declared a new httpclient, that is, do not need to login, * if you want to be set to be able to retrieve after landing, you need to use the global httpclient, and can not declare a * * @param keyword * keyword * @return Retrieves the result of the HTML */public static string Serchbook (string keyword) {httpget httpget = null; String searchresulthtml = null; HttpClient HttpClient = new Defaulthttpclient (); HttpResponse response;/** * Field order is important * * Set query condition is: keyword, East Campus, can borrow */list<namevaluepair> params = new Arraylist<nameval Uepair> ();p Arams.add (New Basicnamevaluepair ("SearchType", "X"));p Arams.add (New Basicnamevaluepair ("Searcharg", keyword));//query keyword Params.add (new Basicnamevaluepair ("SearchScope", "1"));//1 for Eastern Params.add (new Basicnamevaluepair ( "Sortdropdown", "-"));p Arams.add (New Basicnamevaluepair ("Sort", "DZ"));//Set Sort by date inverted params.add (new Basicnamevaluepair ("extended", "0"));p Arams.add (New Basicnamevaluepair ("SUBMIT", "search"));//Query button Params.add (new Basicnamevaluepair ("Availlim", "1"));//Set query criteria---can borrow params.add (new Basicnamevaluepair ("Searchlimits", ""));pArams.add (New Basicnamevaluepair ("Searchorigarg", ""));//Set the keyword and sort method of the last query//encode the parameter string param = Urlencodedutils.format (params, "UTF-8"); System.out.println (param); try {//url and parameter stitching//http://innopac.lib.xjtu.edu.cn/search~s1*chx/string Test_url = "http:/ /innopac.lib.xjtu.edu.cn/search~s1*chx/"; httpget = new HttpGet (Test_url +"? "+ param); Httpget.setheader (" Host "," Innopac.lib.xjtu.edu.cn "); Httpget.setheader (" Referer ", Test_url +"? "+ param); response = Httpclient.execute (HttpGet) ; int code = Response.getstatusline (). Getstatuscode (); System.out.println ("---------------searchbook------------------------"); System.out.println (Response.getstatusline ()), if (code = =) {if (response! = null) {searchresulthtml = Entityutils.tostring (Response.getentity (), HTTP. Utf_8); return searchresulthtml;}}} catch (Clientprotocolexception e) {e.printstacktrace ();} catch (IOException e) {e.printstacktrace ();} finally { Httpget.abort ();} Return "";}
This results in the HTML, the following is also the use of Jsoup to parse it, and encapsulation.
First take a look at the status of the page display:
Based on the information needed to parse, we need two package classes.
One class encapsulates bibliographic information, and the other encapsulates the collection information. These two classes are as follows:
1. Class BookInfo
Package Com.ali.login.bean;import java.util.list;/** * Details of the bibliography in the search results * * @author Shuyan * */public class BookInfo {privat E string imglink;//Picture link private string brieftitle;//java JDK 7 Instance treasure Java jdk 7 shi Li Bao Dian/han Xue,//Guo Tianjiao Authoring private string year;//2014 text Printing material private list<bookaddress> bookaddresses;//Bibliographic Holdings information private String reservelink;//Reservation link public BookInfo () {super ();} Public BookInfo (String imglink, String brieftitle, String year,list<bookaddress> bookaddresses, string Reservelink) {super (); this.imglink = Imglink;this.brieftitle = Brieftitle;this.year = Year;this.bookaddresses = Bookaddresses;this.reservelink = Reservelink;} Public String Getimglink () {return imglink;} public void Setimglink (String imglink) {this.imglink = Imglink;} Public String Getbrieftitle () {return brieftitle;} public void Setbrieftitle (String brieftitle) {this.brieftitle = Brieftitle;} Public String GetYear () {return year;} public void Setyear (String year) {this.year = year;} Public LIST<BOOKADDRESS> Getbookaddresses () {return bookaddresses;} public void setbookaddresses (list<bookaddress> bookaddresses) {this.bookaddresses = bookaddresses;} Public String Getreservelink () {return reservelink;} public void Setreservelink (String reservelink) {this.reservelink = Reservelink;} @Overridepublic String toString () {return "BookInfo [imglink=" + Imglink + ", brieftitle=" + brieftitle+ ", year=" + year + ", bookaddresses=" + bookaddresses+ ", reservelink=" + Reservelink + "]";}}
2.BookAdress
Package com.ali.login.bean;/** * Bibliographic Collection information * * @author Shuyan * */public class Bookaddress {private String holdland; Collection Place private string callnumber;//call call book private string status;//State public bookaddress () {super ();} Public bookaddress (String Holdland, String callnumber, string status) {This.holdland = Holdland;this.callnumber = Callnum Ber;this.status = status;} Public String Getholdland () {return holdland;} public void Setholdland (String holdland) {this.holdland = Holdland;} Public String Getcallnumber () {return callnumber;} public void Setcallnumber (String callnumber) {this.callnumber = Callnumber;} Public String GetStatus () {return status;} public void SetStatus (String status) {this.status = status;} @Overridepublic String toString () {return "bookaddress [holdland=" + Holdland + ", callnumber=" + Callnumber + ", status=" + Status + "]";}}
With these two classes, the HTML can be parsed and encapsulated.
This is starting to get a little tricky because the labels here are very irregular.
The code is as follows:
/** * HTML for processing query results * * @param searchresulthtml * HTML String * * @return Bibliographic information set */public static LIST<BOOKINFO&G T Getsearchresult (String searchresulthtml) {list<bookinfo> Bookinfos = new arraylist<bookinfo> ();D ocument Document = Jsoup.parse (searchresulthtml); Elements items = Document.getelementsbyclass ("Briefcitrow");//bibliography Set int i = 1;for (Element item:items) {BookInfo BookInfo = NULL; list<bookaddress> bookaddresses = new arraylist<> (); Element Ele_par = Item.select ("A[href]"). Get (0);//http://202.117.24.227/bibimage/zycover.php?isbn= 9787121217074String imglink = ele_par.child (0). attr ("src");//Picture link element ele_reserve = Item.getelementsbyclass (" Briefcitrequest "). Get (0);//book link element ele_ahref = Ele_reserve.select (" A[href] "). Get (0); String Reservelink = ele_ahref.attr ("href");//reservation link//need to add host in front///availlim/search~s1*chx?/xjava&searchscope=1 &sort=dz/xjava&searchscope=1&sort=dz&extended=0&subkey=java/1%2c2973%2c2973%2cc/requestbrowse~b3838346&ff=xjava&searchscope=1&sort=dz&1%2c1%2c//note there are two class = BriefcitDetailElements Ele_briefcitdetails = Item.getelementsbyclass ("Briefcitdetail");//First String brieftitle = Ele_briefcitdetails.get (0). Getelementsbyclass ("Briefcittitle"). Get (0). text ();//Bibliography Brief description//processing of the second string year = Ele_briefcitdetails.get (1). Text ();//year Elements ele_addresses = Item.getelementsbyclass ("Briefcititems"). Get (0). Getelementsbyclass ("Bibitems" ). Get (0). Getelementsbyclass ("Bibitemsentry");//Bibliographic Collection Information/** * Reservation is also processed here */for (Element ele_add:ele_addresses) { Bookaddress address = null;//Here are 3 td tags elements Ele_tds = ele_add.getelementsbytag ("TD"); String bookstore = ele_tds.get (0). text (); String Callnumber = ele_tds.get (1). text (); String status = Ele_tds.get (2). text (); address = new Bookaddress (bookstore, Callnumber, status); Bookaddresses.add ( address);} BookInfo = new BookInfo (Imglink, Brieftitle, year, bookaddresses,reservelink); Bookinfos.add (bookInfo);} return Bookinfos;}
This gives you a collection of well-encapsulated bibliographic information, which is tested as follows:
public static void Main (string[] args) {String searchresulthtml = Libraryutil.serchbook ("Java"); list<bookinfo> Bookinfos = Getsearchresult (searchresulthtml); int i = 0;for (BookInfo bookinfo:bookinfos) {if (i< ; 5) {System.out.println (bookinfo.tostring ());} i++;}}
The result is:
This enables the search function of the book ...
Of course there are many things to consider here, such as: The total number of search results, each page needs to show how many records, filter meaningless results, booking function ...
---library client by the function of book retrieval