Small php web page capture and analysis applications

Source: Internet
Author: User

Functions:

APIS for air ticket query and flight query are provided]

Technical Framework

PHP, simple_html_dom.php (a third-party open-source framework that is easy to parse HTML), simplexml_load_string (PHP 5 comes with a tool class that is very convenient to parse XML), regular expressions, and essential web page analysis technologies.

Test the interface through the HTTP interface of Java

Problem Encountered

1. There was no problem when PHP was executed as a script command. When an error was reported under Apache, it was found that the file type in the protocol header of the page itself was incorrect.
Original code: Header ("Content-Type: text/XML; charset = UTF-8 ");
Adjusted: Header ("Content-Type: text/html; charset = UTF-8 ");
2. Encoding Problems. php itself does not have garbled code. garbled code occurs during Java calls, and the character set control of the input stream is adjusted. The URL parameter character set control solves the problem.
For details, see the encoding method in the Java test class.

You can directly paste the following code:

<? PHP/** @ auther xiaoluozheng@sohu-inc.com * @ date 2011-8-26 * this interface implements several functions: Ticket query, flight query, Translation, chinese Translation Day] * Page parameter @ flag Business ID [1, 2, 3 Translation 4 flight query 5 ticket query 6, 7 job search 8 Small Business Startup] * Page parameters @ content request content */header ("Content-Type: text/html; charset = UTF-8 "); include_once ('simple _ html_dom.php '); error_reporting (e_all); // No error output/** flight query interface, where to capture data through the network, analysis * @ flightcode flight number * retun flight description information */function flightquerybyflightcode ($ flightcode ){ // It is best to use urlencode for URL parameter encoding (only English letters are not encoded can also, in the case of Chinese characters or other characters must be URL encoding) $ url = "http://flight.qunar.com/status/fquery.jsp? Flightcode = ". urlencode ($ flightcode); $ html = file_get_html ($ URL); // use the simple_html_dom third-party open-source plug-in to parse webpage data $ COUNT = 0; $ filter = array (2, 5, 6, 7); $ filterstr = array ("flight time" => "", "(" => "", ")" => "", "<B>" => "", "</B>" => "", "" => "", "scheduled time:" => "", "Airport:" => ""); // string to be filtered $ result = array (); foreach ($ HTML-> Find ('. state_detail ') as $ element) {foreach ($ element-> Find ('dt') as $ span) {$ STR = trim ($ Span-> innertext); preg_match_all ("| (. *) <span | u ", $ STR, $ out, preg_pattern_order); $ STR = $ out [1] [0]; $ STR = strtr ($ STR, $ filterstr); array_push ($ result, $ Str);} foreach ($ element-> Find ('span ') as $ span) {$ count ++; if (in_array ($ count, $ filter) {$ STR = trim ($ span-> innertext); $ STR = strtr ($ STR, $ filterstr ); array_push ($ result, $ Str) ;}}$ HTML-> clear (); $ content = implode (",", $ result); RET Urn $ content;}/** translation interface, which calls Bing translation interface * @ flag 1: 2: 3: chinese translation * @ STR translation content * retun returns the translation content */function translate ($ flag, $ Str) {$ inters = array ("1" => "http://api.microsofttranslator.com/V2/Ajax.svc/Translate? Oncomplete = mycallback & appid = a4d660a48a6a97cca791c34935e4c02bbb1bec1c & from = en & to = ZH-CN & text = "," 2 "=>" http://api.microsofttranslator.com/V2/Ajax.svc/Translate? Oncomplete = mycallback & appid = a4d660a48a6a97cca791c34935e4c02bbb1bec1c & from = ZH-CN & to = en & text = "," 3 "=>" http://api.microsofttranslator.com/V2/Ajax.svc/Translate? Oncomplete = mycallback & appid = a4d660a48a6a97cca791c34935e4c02bbb1bec1c & from = ZH-CN & to = JA & text = "); // urlencode is recommended for URL parameters (only English letters are not encoded. url encoding is required when Chinese characters or other characters are used) $ url = $ inters [$ flag]. urlencode ($ Str); $ content = file_get_contents ($ URL); // mycallback ("How do you do"); preg_match_all ("| \(\"(. *) \ "\) | u", $ content, $ out, preg_pattern_order); $ content = $ out [1] [0]; return $ content ;} /** the plane ticket query interface queries the ticket information based on the departure location and destination; Call Ctrip's ticket query interface * @ STR to query the string. For example, if Beijing is in Shanghai, the parameter should be Beijing-Shanghai * @ return returns the discount ticket information for the current day */function flightquerybycity ($ Str) {// URL parameters are best encoded with urlencode (only English letters are not encoded, but must be URL encoded in case of Chinese characters or other characters) $ url = "http://ws.qunar.com/holidayService.jcp? Lane = ". urlencode ($ Str); $ content = file_get_contents ($ URL); $ xml = simplexml_load_string ($ content); $ result = array (); foreach ($ XML-> airline-> line [0]-> attributes () as $ key => $ value) {$ result [$ key] = $ value ;} foreach ($ XML-> airline-> line [0]-> Children ()-> attributes () as $ key => $ value) {$ result [$ key] = $ value;} // The following logic is to get the airport information for take-off and landing through the flight number $ TMP = explode ("", $ result ['go _ avc']); // obtain the flight number $ flightcode = $ TMP [1]; $ TMP = flightquerybyflightcode ($ flightcode); // get flight details $ TMP = explode (",", $ TMP ); $ airport = $ TMP [1]; // construct data $ content = "current lowest discount :". $ result ['go _ avc']. ",". $ airport. ",". $ result ['go _ start']. "-". $ result ['go _ expires ']. ",". $ result ['discount']. $ result ['price']. "Yuan"; return $ content;} function findjob ($ flag, $ city) {// tech class $ url_a = "http://www.51zgzg.com/search/searchEmp.do? Method = SEARCH & words = % E6 % 8A % 80% E5 % B7 % A5 & funtypeid = & funtypename = & jobareaid = 11000000 & jobareaname = "; // sales class $ url_ B = "http://www.51zgzg.com/search/searchEmp.do? Method = SEARCH & words = % E9 % 94% 80% E5 % 94% AE & funtypeid = & funtypename = & jobareaid = 32050000 & jobareaname = "; if ($ flag =" 6 ") {$ url = $ url_a.urlencode ($ City);} if ($ flag = "7") {$ url = $ url_ B .urlencode ($ City );} $ errmsg = "At present, the system does not have the job information you are looking! "; $ Result = array (); $ COUNT = 0; $ html = file_get_html ($ URL ); // use the simple_html_dom third-party open-source plug-in to parse webpage data foreach ($ HTML-> Find ('tr') as $ element) {foreach ($ element-> Find ('td ') as $ TD) {$ STR = trim ($ TD-> innertext); array_push ($ result, $ Str);} If (++ $ count % 4 = 0) break;} $ HTML-> clear (); $ content = implode ("###", $ result); return $ content! = ""? $ Content: $ errmsg;} function findproject () {} function printlog ($ content) {/* $ fp = fopen ("log.txt", "A +"); $ content. = "\ r \ n"; fwrite ($ FP, $ content); fclose ($ FP); */} $ flag =
{1}

Request ['flag']; // Business ID 1, 2, 3 Translation 4 flight query 5 ticket query $ STR =

{1}

Request ['content']; // request the specific content printlog ($ flag. "-". $ Str); // $ flag = "2"; // $ STR = ""; $ content = ""; // response content try {Switch ($ flag) {Case "1": Case "2": Case "3": $ content = translate ($ flag, $ Str); break; Case "4 ": $ content = flightquerybyflightcode ($ Str); break;
Case "5": $ content = flightquerybycity ($ Str); break ;}} catch (exception $ exc) {// echo $ EXC-> getmessage ();} printlog ($ content); echo $ content;?>

 

 

The test code is as follows:

Import Java. io. bufferedreader; import Java. io. inputstreamreader; import Java. io. printwriter; import java.net. URL; import java.net. urlconnection; import java.net. urlencoder; public class testclient {/*** @ Param ARGs * @ throws exception */public static void main (string [] ARGs) throws exception {string url = "http: // localhost/Ceshi/server. PHP "; try {string [] [] Params = {" 1 "," Laugh "},{" 2 "," Nice weather today "},{" 3 "," Good morning "},{" 4 "," czw.2 "},{" 5 "," Hangzhou-Guangzhou "}}; string content = ""; for (INT I = 0; I <Params. length; I ++) {// The HTTP call URL method parameter is a Chinese character or symbol (except for numbers and English letters) the URL encoding must be used to transmit content = "flag =" + Params [I] [0] + "& content =" + urlencoder. encode (Params [I] [1], "UTF-8"); Url realurl = new URL (URL); urlconnection con = realurl. openconnection (); con. setdooutput (true); con. setdoinput (true); con. setrequestproperty ("Pragma:", "No-Cache"); c On. setrequestproperty ("cache-control", "No-Cache"); printwriter out = new printwriter (con. getoutputstream (); out. print (content); out. flush (); out. close (); bufferedreader in = new bufferedreader (New inputstreamreader (con. getinputstream (), "UTF-8"); string line; while (line = in. readline ())! = NULL) {system. Out. println (line) ;}in. Close () ;}} catch (exception e) {Throw e ;}}}

The output is as follows:

Smile today the weather is really good morning there are just two Chinese Southern Airlines, Building B of Xiaoshan Airport-Baiyun Airport, model: jet, flight distance: 1099 km, -current lowest discount: China Southern Airlines cz3820, Xiaoshan Airport B floor-Baiyun Airport,-, 4.9 off 510 yuan.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.