Simple csdn blog article crawling implemented by PHP (continued: Search tips for adding usernames)

Source: Internet
Author: User

The previous blog article implements the function of listing all of its blog articles based on the specified csdn user name. However, this is actually not very useful, we must precisely know the user name of the user we are interested in before listing his blog articles. So I am idle to get an input prompt function similar to Google suggest.


To implement search prompts, you must have a csdn user list available for use. There is no doubt that this user list depends on your own to go to crawl, so I wrote an extremely simple crawling PHP script, as follows:

 

<? Php <br/> // <br/> require_once ("config. PHP "); <br/> require_once (" database. PHP "); <br/> require_once (" crawl_func.php "); <br/> /// fetch URL content <br/> function get_url_content ($ URL) <br/>{< br/> If (extension_loaded ('curl') <br/>{< br/> $ CH = curl_init ($ URL ); <br/> curl_setopt ($ ch, curlopt_header, 0); <br/> curl_setopt ($ ch, curlopt_returntransfer, 1 ); <br/> $ content = curl_exec ($ ch); <br/> curl _ Close ($ ch); <br/>}< br/> else <br/>{< br/> $ content = file_get_contents ($ URL ); <br/>}< br/> //! $ Content & Die ("get $ URL content error."); <br/>! $ Content & $ content = "zfq"; <br/> return $ content; <br/>}</P> <p> /// <br/> set_time_limit (0); <br/> header ("Content-Type: text/html; charset = UTF-8 "); <br/> /// <br/> zfqsql_open (); <br/> addurltodb ($ start_crawl_url ); <br/> /// <br/> $ url_counter = 0; <br/> /// crawl .... <br/> Print "start to crawl... <br/>/N "; <br/> while ($ crawler url = geturltocrawl ()) <br/> {<br/> /// hack <br/> usleep ($ crawl_thread_slee P_time); <br/> Print "handle URL :". $ crawlurl. "<br/>/N"; </P> <p> /// <br/> $ content = get_url_content ($ crawlurl ); <br/> $ reg_pattern = '/(http: // (blog | hi) .csdn.net // ([a-zA-Z0-9 _-] + )//?) /"/'; <Br/> If (preg_match_all ($ reg_pattern, $ content, $ matches )) <br/>{< br/> $ tempcrawurls = $ matches [1]; <br/> $ tempblogname = $ matches [3]; <br/> $ num = count ($ matches [0]); <br/> for ($ I = 0; $ I <$ num; $ I ++) <br/>{< br/> addurltodb ($ tempcrawurls [$ I]); <br/> addblognametodb ($ tempblogname [$ I]); <br/>}< br/> $ reg_pattern = '/(http: // (Forum | topic) .csdn.net//##/ "] +) /"/'; <br/> If (preg_match_all ($ reg_p Attern, $ content, $ matches) <br/>{< br/> $ exclude_pattern = '/user/'; <br/> $ tempcrawurls = $ matches [1]; <br/> $ num = count ($ matches [0]); <br/> for ($ I = 0; $ I <$ num; $ I ++) <br/>{ <br/> preg_match ($ exclude_pattern, $ tempcrawurls [$ I], $ exclude_match); <br/> if ($ exclude_match) <br/>{< br/> continue; <br/>}< br/> addurltodb ($ tempcrawurls [$ I]); <br/>}</P> <p> // <br/> $ url_counter ++; <br/> updateurltodb ($ Crawlurl); <br/>}</P> <p >?> 

 

The process of crawl is relatively slow. It has been crawled for 3 or 4 hours by the posting time, and the total number of users has reached more than 2 W, the number of analyzed URLs is only over 1000. The number of URLs to be analyzed and crawled is about so far. This number will continue to increase, of course, the larger the number of users crawled, the smaller the relationship between the number of URLs to be crawled.

 

After crawl obtains the required csdn user list, the rest of the work is how to implement the Search Prompt function. This part of implementation refers to other people's code, so we will not introduce it here, only the relevant code and

 

 

//// // Index.htm

 

<HTML> <br/> <pead> <br/> <meta http-equiv = Content-Type content = "text/html; charset = UTF-8 "> <br/> <title> for list csdn blog entry </title> <br/> <MCE: script Type = "text/JavaScript" src = "jquery-1.2.1.pack.js" mce_src = "jquery-1.2.1.pack.js"> </MCE: SCRIPT> <br/> <MCE: script Type = "text/JavaScript"> <! -- <Br/> function Lookup (inputstring) {<br/> If (inputstring. length = 0) {<br/> // hide the suggestion box. <br/> $ ('# suggestions '). hide (); <br/>}else {<br/> $. post ("bloginputhint. PHP ", {querystring:" "+ inputstring +" "}, function (data) {<br/> If (data. length> 0) {<br/> $ ('# suggestions '). show (); <br/> certificate ('{autosuggestionslist'{.html (data); <br/>}< br/> }); <br/>}< br/>}// lookup </P> <p> function fill (thisva Lue) {<br/> $ ('# blogname '). val (thisvalue); <br/> setTimeout ("$ ('# suggestions '). hide (); ", 200); <br/>}< br/> // --> </MCE: SCRIPT> <br/> <MCE: style type = "text/CSS"> <! -- <Br/> body {<br/> font-family: Helvetica; <br/> font-size: 11px; <br/> color: #000000; <br/>}</P> <p> H3 {<br/> margin: 0px; <br/> padding: 0px; <br/>}< br/>. suggestionsbox {<br/> position: relative; <br/> left: 300px; <br/> margin: 10px 0px 0px 0px; <br/> width: 200px; <br/> background-color: #212427; <br/>-moz-border-radius: 7px; <br/>-WebKit-border-radius: 7px; <br/> border: 2px solid #000; <br/> color: #00ff00; <br/>}</P> <p>. suggestionlist {<br/> margin: 0px; <br/> padding: 0px; <br/>}</P> <p>. suggestionlist Li {<br/> margin: 0px 0px 3px 0px; <br/> padding: 3px; <br/> cursor: pointer; <br/>}< br/>. suggestionlist Li: hover {<br/> background-color: # 1f20ff; <br/>}< br/> --> </MCE: style> <style type = "text/CSS" mce_bogus = "1"> body {<br/> font-family: Helvetica; <br/> font-size: 11px; <br/> color: #000000; <br/>}</P> <p> H3 {<br/> margin: 0px; <br/> padding: 0px; <br/>}< br/>. suggestionsbox {<br/> position: relative; <br/> left: 300px; <br/> margin: 10px 0px 0px 0px; <br/> width: 200px; <br/> background-color: #212427; <br/>-moz-border-radius: 7px; <br/>-WebKit-border-radius: 7px; <br/> border: 2px solid #000; <br/> color: #00ff00; <br/>}</P> <p>. suggestionlist {<br/> margin: 0px; <br/> padding: 0px; <br/>}</P> <p>. suggestionlist Li {<br/> margin: 0px 0px 3px 0px; <br/> padding: 3px; <br/> cursor: pointer; <br/>}< br/>. suggestionlist Li: hover {<br/> background-color: # 1f20ff; <br/>}</style> <br/> </pead> <br/> <H1 align = "center"> <br/> csdn blog entry list <br/> </p> <br/> <form method = "get" Action = "get_csdnblog.php"> <br/> <br /> <br/> <Table cellspacing = 0 cellpadding = 0 width = 400 align = "center"> <br/> <tr> <br/> <TD align = "center"> <input maxlength = 256 size = 50 name = "blogname" id = "blogname" type = "text" onkeyup = "Lookup (this. value); "onblur =" fill (); "> <br/> <input type = submit value =" list "> </TD> <br/> </tr> <br/> </table> <br/> <Div class = "suggestionsbox" id = "Suggestions" style = "display: none; "mce_style =" display: none; "> <br/> <br/> <Div class =" suggestionlist "id =" autosuggestionslist "> <br/> </div> <br/> </div> <br/> </form> <br/> </body> <br/> </ptml> 

 

 

/// // Bloginputhint. php

 

<? Php </P> <p> // PhP5 implementation-uses mysqli. <br/> // mysqli ('localhost', 'yourusername', 'yourpassword', 'yourdatabase'); <br/> $ db = new mysqli ('localhost ', 'root', '123', 'sdnblog'); </P> <p> If (! $ DB) {<br/> // show error if we cannot connect. <br/> echo 'error: cocould not connect to the database. '; <br/>} else {<br/> // is there a posted query string? <Br/> If (isset ($ _ post ['querystring']) {<br/> $ querystring = $ db-> real_escape_string ($ _ post ['querystring']); </P> <p> // is the string length greater than 0? </P> <p> If (strlen ($ querystring)> 0) {<br/> // run the query: we use like '$ querystring %' <br/> // The percentage sign is a wild-card, in my example of countries it works like this... <br/> // $ querystring = 'uni'; <br/> // returned data = 'United States, United Kindom '; </P> <p> // you need to alter the query to match your database. <br/> // eg: Select yourcolumnname from yourtable where yourcolumnname lik E '$ querystring %' limit 10 </P> <p> $ query = $ db-> query ("Select User From blogname where user like '$ querystring % 'limit 10" ); <br/> if ($ query) {<br/> // while there are results loop through them-fetching an object (I like PhP5 BTW !). <Br/> while ($ result = $ query-> fetch_object () {<br/> // format the results, Im using <li> for the list, you can change it. <br/> // The onclick function fills the textbox with the result. </P> <p> // you must change: $ result-> value to $ result-> your_colum <br/> echo '<li onclick = "fill (/''. $ result-> User. '/'); "> '. $ result-> User. '</LI>'; <br/>}< br/>} else {<br/> echo 'error: there was a problem with Query. '; <br/>}< br/>} else {<br/> // dont do anything. <br/>} // There Is A querystring. <br/>} else {<br/> echo 'there shocould be no direct access to this script! '; <Br/>}< br/>?> <Br/> 

 

 

:

 

 

 

 

 

 

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.