The PDF generated from the HTM can probably be implemented in two steps, the first step, parse htm, that is, the HTM source file in that pair of text into the browser finally presented to us that the results of the illustrated. This is an unfinished task, because so far the industry's software giants have not had the HTM parsing done very well. Compared to IE, Firefox and other browsers display results can be imagined. Since the industry problems, I will not go to the dead to do technical research, skip this step, consider the next thing.
The second step, to draw a PDF, this simple, online has a lot of information, interested friends can study PDF file format, install binary assembly PDF. I'm interested, but I don't have time, and I think software practitioners should focus on the most valuable things at all times. The first method to improve the efficiency of software practitioners is reuse, and there is something called Itextsharp on the web that is used to draw PDFs, free to use and open source.
Download Itextsharp, try to use Itextsharp to draw HTM to see the effect, as you expected, the source code of the HTM is drawn. Because the first step of the matter we have not resolved, the following to solve the first step of things.
Remember a long time ago I have seen a. NET written Web snap tool, presumably the idea is to use the WebBrowser DrawToBitmap method to output the IE display results to the Sytem.Drawing.Bitmap object. The approximate code is as follows:
WebBrowser Wb=null;
System.Drawing.Bitmap bmp = New System.Drawing.Bitmap (w, h);
Wb. DrawToBitmap (BMP, New System.Drawing.Rectangle (0,0, W, h));
Ok,htm can parse, now reorganize just the code, thinking as follows:
Using WebBrowser to parse and convert HTM to a picture, use Itextsharp to draw the image you just made into PDF.
Useful is to the development of the company's functions, temporarily inconvenient to open the source code, to provide my compiled tools for download use, you can also according to the above ideas to customize:
Use method,
1. Convert a single URL to Pdf:PageToPDF.exe "http://www.g.cn/" "Google.jpg"
2. Convert multiple URLs to Pdf:pagetopdf.exe task.txt "C:\pdfdir\"
Task.txt is the task of the table, which provides a multiline URL, each URL with the # file name suffix, such as: http://www.baidu.com/#b表示将http://www.baidu.com/converted to PDF file name B (extension system itself will append)
Use in asp.net environments
Upload pagetopdf to the site, set directory permissions, sample code:
Copy Code code as follows:
public static bool Createppdf (string url,string path)
{
Try
{
if (string. IsNullOrEmpty (URL) | | String. IsNullOrEmpty (PATH))
return false;
Process P = new process ();
String str = System.Web.HttpContext.Current.Server.MapPath ("~/afafafasf/pagetopdf.exe");
if (! System.IO.File.Exists (str))
return false;
p.StartInfo.FileName = str;
p.startinfo.arguments = "\" + URL + "\" + path;
P.startinfo.useshellexecute = false;
P.startinfo.redirectstandardinput = true;
P.startinfo.redirectstandardoutput = true;
P.startinfo.redirectstandarderror = true;
P.startinfo.createnowindow = true;
P.start ();
System.Threading.Thread.Sleep (500);
return true;
}
catch (Exception ex)
{
Sys.Log.error ("PDF Create err.", ex);
}
return false;
}
characteristic
When working with task forms, the system starts multiple processes, in which there are multiple Pagetopdf.exe processes in the Task Manager, which are initiated by the system scheduler themselves to handle the speed of the task. The number of processes is controlled by the scheduler itself, up to no more than 10.