Analysis of core technology of network acquisition software series (4)---How to convert HTML pages into PDF (html2pdf) using the C # language

Last Update:2014-11-28 Source: Internet

Author: User

Tags pdflib svn client wkhtmltopdf

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

An overview of a series of essays and the resulting background

This series of the opening by everyone's warm welcome, this is a great encouragement to bloggers, this is the fourth series, I hope you continue to support, for my continued writing to provide the impetus.

own development of the Bean John Blog backup expert software tools since the advent of more than 3 years, by the vast number of bloggers writing and reading enthusiasts love. At the same time, there are some technical enthusiasts consulting me, the software is a variety of practical functions of how to achieve.

The software is used. NET technology development, in order to give back to the community, now the software used in the core technology, open up a column, write a series of articles to treat the vast number of technology enthusiasts.

This series of articles in addition to explaining the network acquisition and the use of a variety of important technologies, but also provides a number of solutions to problems and interface development programming experience, very suitable. NET development of the beginner, intermediate readers, I hope you have a lot of support.

Many beginners often have this kind of confusion, "Why I read the book, C # related to all aspects of knowledge, but it is impossible to write a decent application?" ”

This actually still did not learn to use the knowledge comprehensively, exercise out of programming thinking, build up learning interest, I think the series of articles may help you, I hope so.

Development environment: VS2008

This section source location: Https://github.com/songboriceboy/csharphtml2pdf

Source code Download method: Install the SVN client (provided at the end of this article), and then checkout the following address: Https://github.com/songboriceboy/csharphtml2pdf

The outline of the article series is as follows:

1. How to use the C # language to obtain the blog of a blogger's full essay link and title;
2. How to use C # language to obtain the text and title of the blog post;
3. How to convert HTML pages to PDF (html2pdf) using the C # language
4. How to use the C # language to download all the pictures in the blog post to local and can browse offline
5. How to use the C # language to synthesize multiple individual PDF files into a PDF and generate a table of contents
6. NetEase Blog Links How to use C # language to obtain, NetEase blog particularity;
7. How to use the C # language to download the public number articles;
8. How to obtain the full text of any one article
9. How to use the C # language to get rid of all the tags in HTML for plain text (html2txt)
10. How to compile multiple HTML files into CHM (Html2chm) using the C # language
11. How to use the C # language to publish articles remotely to Sina Blog
12. How to develop a static site builder using the C # language
13. How to build a program framework using the C # language (Classic WinForm interface, top menu bar, toolbars, left tree list, right multi-tab interface)
14. How to implement the Web page Editor (Winform) using the C # language ...

Section Fourth: Introduction to the main content (how to convert HTML pages to PDF using the C # language)

The sample code in this section runs as shown in the interface. After clicking the Generate PDF button, the program does 3 things:

(1) Download the text of the blog post in the web address;

(2) Download all the pictures in the blog to local;

(3) The text and pictures are made into PDF documents with the tools described later.

After clicking the Generate PDF button, the program automatically downloads the corresponding page in the URL and generates a folder named after the title in the directory where the executable program is located, as shown in:

The folder contains the HTML document (index.html file) for the body of the Web page, all the pictures in the body, and the resulting PDF document. As shown in the following:

The resulting PDF document effect is as follows:

The technology required to convert an HTML file into a PDF file is very high, and we need to write a tool that can parse both HTML documents (like browser functionality) and generate PDF documents (needing to master the details of the structure of the PDF document), fortunately one with a handy, free-to-use tool, That's Wkhtmltopdf (http://www.wkhtmltopdf.org/).

first, let's look at how to use this tool to turn an HTML document into a PDF document. Download the executable file for the platform (the Windows platform can be downloaded to the GitHub address above me in the Pdflib folder ). There are 4 files in the folder, as shown in:

Enter the DOS interface, execute the command wkhtmltopdf.exe www.cnblogs.com cnblogs.pdf, the first parameter is the Web address to be converted (www.cnblogs.com),

The second parameter is the name of the PDF file to be saved (cnblogs.pdf), after which the command succeeds and discovers that a cnblogs.pdf file is generated under the current path (the path here is d:/pdflib/).

The resulting PDF file is as follows:

What we need to do next is to integrate this build process into our own software development.

As can be easily concluded by the above approach, we need to use the C # language to implement interprocess communication, that is, to start the wkhtmltopdf.exe process in our own code and pass it to wkhtmltopdf.exe with the required parameters.

Starting a process in C # can take advantage of the process class, and here's a look at the core code as follows:

    Public BOOL_html2pdf (stringfileName) {            stringStrpdfsavedpath =M_strpath; if(! Directory.Exists (Strpdfsavedpath))//determine if there is{directory.createdirectory (strpdfsavedpath);//Create a new path            }            if(! File.exists (Strpdfsavedpath + fileName +". pdf"))            {                stringStrhtmlsavedpath =M_strpath; stringFile_flvbind = Application.startuppath +@"\pdflib\wkhtmltopdf.exe"; //Movefolderto (fileName, Application.startuppath + @ "\pdflib\"); //Generate ProcessStartInfo                ProcessStartInfopinfo =NewProcessStartInfo (File_flvbind); //pinfo. WorkingDirectory = Application.startuppath + @ "\pdflib\";                pinfo.                WorkingDirectory = Strhtmlsavedpath; //Setting ParametersStringBuilder SB =NewStringBuilder (); Sb. Append ("--footer-line"); sb. Append ( "--footer-center \" Powered by the software firm (http://www.cnblogs.com/ice-river) \ ""); Sb. Append ("\""+"index.html\ ""); Sb. Append (" \""+ Strpdfsavedpath + fileName +". pdf"+"\""); pinfo. Arguments = sb.                ToString (); //Hide Windowpinfo. WindowStyle =Processwindowstyle.hidden; //Start the program                                Process p = Process.Start (pinfo);                p.WaitForExit ();                 //Deletefiles (Application.startuppath + @ "\pdflib\");                if (P.exitcode = = 0 ) {Delegatepara DP=NewDelegatepara (); Dp.strlog="Build ["+ FileName +". pdf] Success! \ n";                    M_delespdf.refresh (DP); return true; }                Else{Delegatepara DP=NewDelegatepara (); Dp.strlog="Build ["+ FileName +". pdf] Failed! \ n";                    M_delespdf.refresh (DP); return false; }            }            Else{Delegatepara DP=NewDelegatepara (); Dp.strlog="Build ["+ FileName +". pdf] Success! \ n";                M_delespdf.refresh (DP); return true; }        }

In the code snippet above, the part of the red text is the core, here is a simple explanation:

ProcessStartInfo pinfo = new ProcessStartInfo (file_flvbind);

The parameters in the constructor in the above code are important, specifying the correct file path where the process is to run;

pinfo. WorkingDirectory = Strhtmlsavedpath;

The above code specifies the raw material (i.e. the file path where the HTML file and all images are located) that we want to generate PDF files from;

pinfo. Arguments saves the execution parameters required by the process Wkhtmltopdf.exe, and in the example just cited

Www.cnblogs.com cnblogs.pdf

"--footer-center \" Powered by the software firm (http://www.cnblogs.com/ice-river) \ " is an additional parameter that is passed to the Wkhtmltopdf.exe process to add a footer to the resulting PDF document, as shown in:

pinfo. Arguments = sb. ToString (); This is the argument provided to the process;

Process p = Process.Start (pinfo); This sentence is the start Wkhtmltopdf.exe process;

p.waitforexit (); This sentence is waiting for the wkhtmltopdf.exe process to end,
if (P.exitcode = = 0)
If the end code of the process is 0, the process executes successfully, otherwise the process execution fails.

The code is explained here, the more detailed code of the people can download their own learning.

It should be noted that the code that runs the other process is generic, and you can use it to run other processes in your own program for interprocess communication.

Three Next festival trailer

How to convert HTML pages into txt (html2txt) using the C # language.

Song Bo
Source: http://www.cnblogs.com/ice-river/
The copyright of this article is owned by the author and the blog Park, welcome reprint, but without the consent of the author must retain this paragraph, and in the article page obvious location to give the original link.
is looking at my blog This children's shoes, I see you imposing, there is a faint of the king's Breath, there will be a future! Next to the word "recommended", you can conveniently point it, action quasi, I do not accept a penny, you also good to come back to me!

Analysis of core technology of network acquisition software series (4)---How to convert HTML pages into PDF (html2pdf) using the C # language

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More