A solution to generate a bookmarked PDF document based on a bulk URL

Source: Internet
Author: User
Tags cdata

First, the Origin

See a good article, a wonderful blog, our earliest practice there are two, add to Favorites, or Save As, later appeared a new way, posted to their own blog, or to some favorite sites (such as 360doc). Recently appeared the way to generate PDF documents, such as some sites, you submit a URL, for you to generate the corresponding Pdf,javaeye ebook production is also very good, even to predict that the browser "Save as" window there may be a *.pdf option. Because the PDF is very good, small and expressive rich. However, if there are a number of good articles, such as a very good serial (such as the cow to write the tutorial, development experience), we can do is to add to the favorites or add to their blog link inside. Think of that year, when there is no money to buy books, on the internet to find some tutorials to see, do a better job will have a page listing all the connections, this situation I usually use the Thunder download all the links (and then remove unrelated links) in bulk downloading down, do simple to provide a similar to the next link in the previous article, This is still the case until now. (Ease of use, ah, this problem can be small, to a large extent, related to the application model and business model, to small said may be a beautiful and convenient.) Javaei provides quick reading to take the left tree menu display directory, the right to display the form of content, this form in the interface design. )

This article is about a solution that generates a bookmarked PDF document based on a batch URL, which is the URL of a good article that generates a merged PDF document based on these URLs and has a bookmark (that is, the tree menu on the left) and must be bookmarked. "Java and Mode" This book must have been read by many people, the old thick book Ah, I have no money to buy, see is a download PDF, this PDF gives me the impression is too bad, no bookmarks, to find something can only pull the scroll bar, however, I still read, write well, the person who made the PDF balance.

Second, train of thought

The goal is to generate a bookmarked PDF document based on the batch URL, which is accomplished in two steps: First, solve the problem of creating a PDF document based on a URL, and then solving multiple PDFs merging and creating bookmarks.

(1) Generate a PDF document based on a URL

It seems easy to generate a PDF document based on a URL, because we have itext,pdfbox these open source frameworks, but it is not simple, because to ensure that the resulting PDF document open to the same effect as the browser, which is tantamount to a browser, the current browser there is a compatibility problem, So it's hard to write your own idea of creating PDFs based on HTML. Then another idea is to use some Web sites to achieve this goal, after trying, some sites are required to provide URL and email, the production of a good PDF sent to your mailbox, this form can not be accessed through the code, it can not be batch processing; some sites just submit URLs, The generated PDF is responded to the client, which can be processed in batches by program, but the resulting pdf is too far away from the browser, and some sites do not support Chinese at all. Through exploration, finally found a Web site provided by C # to do the DLL can achieve this requirement, using this DLL, write a simple C # program can be generated in batches of PDF, and the effect is quite perfect, the drawback is that the generated PDF has someone else's watermark.

(2) Multiple PDFs merged and generated bookmarks

Multiple PDFs merging and creating bookmarks can be easily done with itext, the merge is in a certain order, and the bookmark is a tree structure, so the order of merging, the hierarchy of the bookmark needs to be determined in advance. Therefore, the bulk of the URL to be a certain description, so it is natural to choose XML.

Third, realize

I'm getting the feeling that as long as it's not infrastructure, it's technically simple, and the key is that you have no idea. This implementation begins with the XML description.

The XML description is divided into two steps, first describing a batch of URLs (called Href.h2p.xml), and then describing the hierarchical relationship (called outline.h2p.xml). H2P is the meaning of HTML to PDF

Watch Href.h2p.xml first.

                              <value><![CDATA[http;//www.163.com]]></value>
                                        <value><![CDATA[http://www.sohu.com]]></value>
                                        <value><![CDATA[http;//news.163.com]]></value>
                                        <value><![CDATA[http;//sports.163.com]] ></value>
                                        <value><![CDATA[http://news.sohu.com]]></value>
         

This XML is simple, because the URL usually has & and the symbol does not appear in the XML, and as the value of the attribute, it is not <! [Cdata[]]>, so just as a node.

The value of the ID of each PDF file generated based on this XML, and the suffix is pdf.

Outline.h2p.xml contents are as follows:

<book name="我的PDF书">
          <chapter name="163" href="KxgYaRxG">
                    <chapter name="163新闻" href="eyEis6ra" />
                    <chapter name="163体育" href="DMQoSN2t" />
          </chapter>
          <chapter name="sohu" href="53Bw5A32">
                    <chapter name="sohu新闻" href="5vaf3LN7" />
          </chapter>
</book>

This XML describes the order of each PDF merge, the value of href corresponds to the ID value of the previous XML, and the level of the chapter tag nesting is the level of the bookmark, and the value of name is the name of the bookmark. Itext each PDF into a PDF based on this XML and generates bookmarks.

I refer to these two XML files as h2p files.

Iv. h2p Documents

In this case, the solution is over, as the saying goes, Ching, first of all, we want to have the above two XML files, these two XML files if by hand-edited, a small number of URLs are OK, if more of it is inconvenient. So there should be a tool to edit the h2p file.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.