Now the direction of work has become more and more inclined to project product operation related things. There is also a certain degree of understanding of online operations.
With some technical operations, can greatly facilitate the work of a variety of high-difficulty tasks, rapid self-improvement, accumulation of rich experience and resources.
With the recent production of LP as an example, in the absence of any experience, the need to quickly and design communication production landing page, in the middle of some experience is not enough, so look for a variety of other people's home landing page to study the ideas and practices of others, accumulate some activities landing page, the work is a relatively great benefit. The more you refer to other people's things, the more you can do to make something more powerful, and everything as close as possible to the extreme.
So spent a whole day, based on the browser, made a PA station tool, to see some of the better landing page all down to the local, quickly browse and study.
There are already a lot of tools on the market, including the most original teleport, and then the Customer Grill Station tools, Template Thieves, Web thieves, Web page extraction Assistant or something. Basically have used, basically can not complete the CSS, scripts, images effectively downloaded back, in the format I want to store.
Take the landing page of a lazy net that I found on Sina, for example:
What I want is the result.
The landing page name is INDEX.HTML,CSS, the picture and JS for me to classify into the specified directory, and then to the root directory to mark a txt document, tell me when this landing page I was the URL to download back.
The final level of the interface is like this
Do very rough, only one analysis and crawl and a C # webbrowser can be used.
The program will automatically download the relevant data, coexist to the relevant directory, generate the required files.
Why not just download the HTML directly using the usual way of getting the source code directly?
1, is the HTML inside parse relative path and absolute path very very very troublesome, need regular expression one by one match, and replace cost ground path.
2, in some cases, the browser matching results may be more accurate, the implementation should be more simple.
Code Flow:
1. The browser loads the landing page and waits for loading to complete.
2. Traverse all nodes
Picture, just download, save, and replace with local path, script same, CSS same. Finally, the source code of the style format embedded in the landing page is processed.
Finally, open the CSS file, connect and download the remote picture with the regular matching image, and replace the image address inside the CSS.
Paste in the parsing CSS inside the image and download to the local code. The rest is relatively simple.
// <summary> /// parse CSS source inside the picture, download, and convert the link to local format // </summary> // <param name= "Content" >css file contents </param> /// <param name= "cssurl" >css file path, for conversion to picture absolute address download </param> // <returns></returns> Public stringPARSEIMGINCSS (stringContentstringCssurl ="") {Regex Reg =NewRegex (@ "url\ (. *?) \)", regexoptions.ignorecase); Content = Reg. Replace (content, match) = {stringImgurl = match. GROUPS[1]. Value; Imgurl = Imgurl.replace ("'",""). Replace ("\"","");//replace ' and double quotes, because URL matching is possible with a quoted numberImgurl = Htmlhelper.geturlrelative (Cssurl, Imgurl);//fix path to absolute address if(!string. IsNullOrEmpty (Imgurl)) {//Remote picture download saved to localvar localimage = Path.Combine ("Images", Htmlhelper.getfilenameinurl (Imgurl)); Downloadhelper.downloadfile (Imgurl, Path.Combine (Savefolder, Sitefolder, localimage));//Download to local if(Cssurl.tolower (). IndexOf (". css") >-1) {//Representative is a CSS file that will be stored in the CSS directory and need to use a relative address return @ "url ('.. /"+ Localimage.replace ("\\", "/") + @"') "; } else {//represents the CSS inside the page and does not need to switch to the relative path return @ "url ('"+ localimage.replace ("\\", "/") + @"‘)"; ; } }returnMatch. Value; });returnContent }
Finally, add comments with 100 lines of code to complete the entire Grill Station tool.
There are plenty of places to be perfected, but it's already ready to use. 1 minutes to pick up a few station landing page is no problem, ready to take the time to go to each peer site Baidu to promote their pages ...
Grilled 10 LP, the directory has not found the problem. If there is a chance to wrap it up for sale.
Will engage the people, can play their own, the idea has already.
More exciting content, visit Little five blog http://www.lingdonge.com
C # Accumulate a large number of landing page lists through the WebBrowser quick-grill station idea