Recently, I encountered a task of converting HTM to PDF. This is a useful feature block, but unfortunately, there is no ready-to-use solution on the Internet (including open-source/free, easy-to-use and maintainability considerations. Since there is no ready-made solution, you can solve it yourself.
Generating a PDF file from HTM can be implemented in two steps. The first step is to parse the HTM file and convert the text in the HTM source file to the graphic result that the browser finally presents to us. This is an unfinished task, because so far no software giant in the industry has done a good job in HTM parsing. Comparing the display results of IE, Firefox, and other browsers can be imagined. Since the industry is difficult, I will not go into technical difficulties. I will skip this step and consider the next step.
Step 2: Draw a PDF file. This is simple. There is a lot of information on the Internet. If you are interested, you can study the PDF file format and install the binary Assembly PDF file. I am interested, but I don't have time. I think software practitioners should always pay attention to the most valuable things. The first method for software practitioners to improve efficiency is reuse. There is something on the Internet called itextsharp that is used to draw PDF files and can be used for free and open-source.
Download itextsharp and try to use itextsharp to draw HTM to see the effect. As expected, HTM's Source code . Because we have not solved the first step, we will solve the first step.
I remember seeing a Web page snap tool written by. net a long time ago. The general idea is to use the drawtobitmap method of webbrowser to output the IE display result to the sytem. Drawing. Bitmap object. Approximate Code As follows: // Webbrowser WB = NULL;
System. Drawing. bitmap BMP = New System. Drawing. Bitmap (W, H );
WB. drawtobitmap (BMP, New System. Drawing. rectangle ( 0 , 0 , W, h ));
OK, HTM can be parsed. NowRestructuring just nowCode, thinkingPath:
EnableUse webbrowser to parse and convert HTM into an image, and use itextsharp to plot the imagePDF.
It is a function developed for the company. It is inconvenient to publish the source code for the moment. It provides compiled tools for download and use. You can also customize them based on the above ideas:
Usage,
1. convert a single URL to PDF: pageto0000.exe "http://www.g.cn/" google.jpg"
2. Convert multiple URLs to PDF: pagetow..exe task.txt "C: \ 20.dir \"
Task.txt is a table in the task, which provides multi-line URLs. Each URL is suffixed with # file name, for example, http://www.baidu.com/? B = http://www.baidu.com/ B (the extension system appends itself)
Use in Asp.net Environment
Upload pagetopdf to the website and set the directory permission. Sample Code:
Code
Public Static Bool Createppdf ( String URL, String Path)
{
Try
{
If ( String . Isnullorempty (URL) | String . Isnullorempty (PATH ))
Return False ;
PROCESS p = New Process ();
String Str = System. Web. httpcontext. Current. server. mappath ( " ~ /Afafasf/pageto0000.exe " );
If ( ! System. Io. file. exists (STR ))
Return False ;
P. startinfo. filename = STR;
P. startinfo. Arguments = " \ "" + URL + " \ " " + Path;
P. startinfo. useshellexecute = False ;
P. startinfo. redirectstandardinput = True ;
P. startinfo. redirectstandardoutput = True ;
P. startinfo. redirectstandarderror = True ;
P. startinfo. createnowindow = True ;
P. Start ();
System. Threading. thread. Sleep ( 500 );
Return True ;
}
Catch (Exception ex)
{
SYS. log. Error ( " PDF create err. " , Ex );
}
Return False ;
}
feature
refers to the worker process, which is started by the System Scheduling Program , to increase the processing speed of a task. The number of processes is controlled by the scheduler and cannot exceed 10.
pagetopdf