Extract keywords from webpages
Extract and display keywords from the specified page.
The code is as follows: |
Copy code |
$ Meta = get_meta_tags ('http: // www.111cn.net /'); $ Keywords = $ meta ['keyword']; // Segmentation keyword $ Keywords = explode (',', $ keywords ); // Organize $ Keywords = array_map ('trim', $ keywords ); // Remove empty content $ Keywords = array_filter ($ keywords ); Print_r ($ keywords ); |
Obtain all links on the page.
The following code uses php dom to retrieve all links on a specified page. It is used as an example and can be used freely.
The code is as follows: |
Copy code |
$ Html = file_get_contents ('http: // www.111cn.net '); $ Dom = new DOMDocument (); @ $ Dom-> loadHTML ($ html ); $ Xpath = new DOMXPath ($ dom ); $ Hrefs = $ xpath-> evaluate ("/html/body // "); For ($ I = 0; $ I <$ hrefs-> length; $ I ++ ){ $ Href = $ hrefs-> item ($ I ); $ Url = $ href-> getAttribute ('href '); Echo $ url. '<br/> '; } |
Automatically converts a URL on the page to a clickable hyperlink
If you want to post some articles or make some pages, to put a hyperlink, you must write a tag. Use the following code to easily convert a URL into a hyperlink for output. The implementation method is relatively simple. The general idea is to use regular expressions to match the URL and then process the output hyperlink.
The code is as follows: |
Copy code |
Function _ make_url_clickable_cb ($ matches ){ $ Ret = ''; $ Url = $ matches [2]; If (empty ($ url )) Return $ matches [0]; // Remove the punctuation marks after the URL If (in_array (substr ($ url,-1), array ('.', ';', ':') = true ){ $ Ret = substr ($ url,-1 ); $ Url = substr ($ url, 0, strlen ($ url)-1 ); } Return $ matches [1]. "<a href =" $ url "rel =" nofollow "> $ url </a>". $ ret; } Function _ make_web_ftp_clickable_cb ($ matches ){ $ Ret = ''; $ Dest = $ matches [2]; $ Dest = 'http: // '. $ dest; If (empty ($ dest )) Return $ matches [0]; If (in_array (substr ($ dest,-1), array ('.', ';', ':') = true ){ $ Ret = substr ($ dest,-1 ); $ Dest = substr ($ dest, 0, strlen ($ dest)-1 ); } Return $ matches [1]. "<a href =" $ dest "rel =" nofollow "> $ dest </a>". $ ret; } Function _ make_email_clickable_cb ($ matches ){ $ Email = $ matches [2]. '@'. $ matches [3]; Return $ matches [1]. "<a href =" mailto: $ email "> $ email </a> "; } Function make_clickable ($ ret ){ $ Ret = ''. $ ret; $ Ret = preg_replace_callback ('# ([s>]) ([w] +? : // [W \ x80-\ xff # $ % &~ /.-;: = ,? @ [] +] *) # Is ',' _ make_url_clickable_cb ', $ ret ); $ Ret = preg_replace_callback ('# ([s>]) (www | ftp). [w \ x80-\ xff # $ % &~ /.-;: = ,? @ [] +] *) # Is ',' _ make_web_ftp_clickable_cb ', $ ret ); $ Ret = preg_replace_callback ('# ([s>]) ([. 0-9a-z _ +-] +) @ ([0-9a-z-] + .) + [0-9a-z] {2,}) # I ',' _ make_email_clickable_cb ', $ ret ); $ Ret = preg_replace ("# (<a ([^>] +?> |>) <A [^>] +?> ([^>] + ?) </A> # I "," $1 $3 </a> ", $ ret ); $ Ret = trim ($ ret ); Return $ ret; } |
Use PHP to generate Data URI code
Generally, the image is encoded into a Data URI format and used in webpages to reduce HTTP requests to improve front-end performance. There are also some other uses. The following code encodes a file into a Data URI.
The code is as follows: |
Copy code |
Function data_uri ($ file, $ mime ){ $ Contents = file_get_contents ($ file ); $ Base64 = base64_encode ($ contents ); Echo "data: $ mime; base64, $ base64 "; } |
Download remote images to a local server
In particular, reposted articles. To prevent image loss caused by the disconnection of the website of the other party, images on the remote server are usually downloaded to the local server when an article is published. The following code implements this requirement. You need to customize more storage locations and link traversal:
The code is as follows: |
Copy code |
$ Image = file_get_contents ('yun _ qi_img/logo.gif '); File_put_contents ('/images/logo.gif', $ image );
|
Remove unnecessary labels in the text
When copying text from a text editor (such as Word) to a Web editor, there may be some additional useless labels, such as some styles that specify text styles. The following code uses regular expression matching to remove these useless tags and purify the text:
The code is as follows: |
Copy code |
Function cleanHTML ($ html ){ // Remove useless tags first (you can customize more tags to be cleared) $ Html = ereg_replace ("<(/)? (Font | span | del | ins) [^>] *> "," ", $ html ); // Run the command again to remove useless attributes. $ Html = ereg_replace ("<([^>] *) (class | lang | style | size | face) = ("[^"] * "| '[^'] * '| [^>] +) ([^>] *)>", "<1> ", $ html ); $ Html = ereg_replace ("<([^>] *) (class | lang | style | size | face) = ("[^"] * "| '[^'] * '| [^>] +) ([^>] *)>", "<1> ", $ html ); Return $ html } |
If you have added some useful PHP code to your favorites