Android Development Note (141) read PPT and PDF files

Source: Internet
Author: User
Tags parsing pdf files

Read the ppt file read plain text previous post on how to read Word file content on Android, the Office Musketeers still has the ppt file read. Before parsing Word files and Excel files, the POI library is used to read the contents of the file, and for PPT, the text on the slide can also be read through the POI. The Hslfslideshow class is the tool class dedicated to parsing slides in poi, each slide is handled by a separate Hslfslide class, and the specific text and graphics content in the slide is distinguished by hslftextparagraph and Hslftextrun.

Here is the use of POI parsing ppt file (2003 format):


Different versions of the POI Library in the interpretation of the PPT code is slightly different, the following is the use of poi15 read PPT code:
public static arraylist<string> readppt (String path) {arraylist<string> Contentarray = new arraylist< String> (); try {fileinputstream fis = new FileInputStream (new File); Hslfslideshow HSLF = new Hslfslideshow (FIS); list


Read graphic style Poi method can only effectively read the text inside the PPT, for the PPT with the picture and text style, the force has not caught. In the blog "Android Development note (140) Word file read and show", Mention can parse the Document.xml file inside the docx, obtain the picture information and style information from the XML tag, then construct the graphic format as an HTML file, and finally load the display HTML by the WebView page view. For pptx files, you can also parse the PPTX internal slide*.xml slide file, using a similar approach to parse docx, the parsed image and style data are written to the HTML file, so that the curve to achieve the reading of the pptx file.

The following is the display of the pptx file in HTML format:


Here is the main code to parse pptx and generate the HTMML file:
private void Readpptx (String pptpath) {try {zipfile pptxfile = new ZipFile (new File (Pptpath)); int pic_index = 1;//pptx The picture name starts from Image1, so the index starts at 1 for (int i = 1; i <; i++) {//Up to 100 slides supported by String FilePath = String.Format ("%s%d.html", Fileuti L.getfilename (Pptpath), i); String Htmlpath = fileutil.createfile ("html", FilePath); LOG.D (TAG, "i=" +i+ ", htmlpath=" + htmlpath); output = new FileOutputStream (new File (Htmlpath));p resentpicture = 0;o Utput.write (Htmlbegin.getbytes ()); ZipEntry sharedstringxml = pptxfile.getentry ("ppt/slides/slide" + i + ". xml"); Get each slide InputStream InputStream = Pptxfile.getinputstream (sharedstringxml); Xmlpullparser Xmlparser = Xml.newpullparser (); Xmlparser.setinput (InputStream, "Utf-8"); Boolean istitle = false; Title Boolean istable = false; Table Boolean issize = false; Literal size Boolean iscolor = false; Text color Boolean iscenter = false; Center-aligned Boolean isright = false; Right justified by Boolean isitalic = false; Italic Boolean isunderline = false; Underscore Boolean isbold = false; Addcoarse int event_type = Xmlparser.geteventtype ()//Gets the status of the label type while (event_type! = xmlpullparser.end_document) {//loop read stream switch (Event_type) {Case Xmlpullparser.start_tag://start tag string tagbegin = Xmlparser.getname (); if (Tagbegin.equalsignorecase ("ph")) {// Determines whether the caption string titletype = Getattrvalue (Xmlparser, "type", "text"), if (Titletype.equals ("text")) {Istitle = false;} else {I Stitle = True;issize = True;if (Titletype.equals ("Ctrtitle")) {Output.write (Centerbegin.getbytes ()); isCenter = True;o Utput.write (String.Format (Fontsizetag, GetSize). GetBytes ());} else if (titletype.equals ("SubTitle")) {Output.write (Centerbegin.getbytes ()); iscenter = True;output.write ( String.Format (Fontsizetag, GetSize ()). GetBytes ()); else if (titletype.equals ("title")) {Output.write (String.Format (Fontsizetag, GetSize ()). GetBytes ());}}} if (Tagbegin.equalsignorecase ("PPr") &&!istitle) {//judgment alignment string align = Getattrvalue (Xmlparser, "ALGN", "L"); Xmlparser.getattributevalue (0); if (Align.equals ("Ctr")) {OUTput.write (Centerbegin.getbytes ()); iscenter = true;} if (Align.equals ("R")) {Output.write (Divright.getbytes ()); isright = True;}} if (Tagbegin.equalsignorecase ("SRGBCLR")) {//determines the text color string = xmlparser.getattributevalue (0); Output.write ( String.Format (Spancolor, color). GetBytes ()); iscolor = true;} if (Tagbegin.equalsignorecase ("RPr")) {if (!istitle) {//Judgment text size string sizestr = Getattrvalue (Xmlparser, "sz", "2800"); int size = GetSize (integer.valueof (SIZESTR)/100), Output.write (String.Format (fontsizetag, size). GetBytes ()); issize = true;} Bold string BStr detected = Getattrvalue (Xmlparser, "B", ""), if (Bstr.equals ("1")) {IsBold = true;} Italic string iStr = Getattrvalue (Xmlparser, "I", "") detected, if (Istr.equals ("1")) {isitalic = true;} The underscore string uStr = Getattrvalue (Xmlparser, "U", "") is detected, if (Ustr.equals ("SNG")) {Isunderline = true;}} if (Tagbegin.equalsignorecase ("TBL")) {//Detects table Output.write (Tablebegin.getbytes ()); istable = true;} else if ( Tagbegin.equalsignorecase ("tr")) {//table row Output.write (RowbegiN.getbytes ());} else if (tagbegin.equalsignorecase ("TC")) {//table column Output.write (Columnbegin.getbytes ());} if (Tagbegin.equalsignorecase ("pic")) {//detected picture ZipEntry Pic_entry = Fileutil.getpicentry (pptxfile, "ppt", Pic_index); if (pic_entry! = null) {byte[] picturebytes = Fileutil.getpicturebytes (Pptxfile, pic_entry); Writedocumentpicture (I, picturebytes);} pic_index++; After converting one, the index +1}if (tagbegin.equalsignorecase ("P") &&!istable) {//detects the paragraph, and if it ignores output.write in the table ( Linebegin.getbytes ());} Detected text if (Tagbegin.equalsignorecase ("T")) {if (IsBold = = true) {//Bold Output.write (Boldbegin.getbytes ());} if (Isunderline = = true) {//detected underscore, input <u>output.write (underlinebegin.getbytes ());} if (Isitalic = = true) {//detected italic, input <i>output.write (italicbegin.getbytes ());} String text = Xmlparser.nexttext (); Output.write (Text.getbytes ()); Write text if (isitalic = = true) {//input italic end tag </i>output.write (Italicend.getbytes ()); isitalic = false;} if (Isunderline = = true) {//Enter an underscore end tag </u>output.wriTe (Underlineend.getbytes ()); isunderline = false;} if (IsBold = = true) {//input bold end tag </b>output.write (Boldend.getbytes ()); isbold = false;} if (issize = = true) {//Enter the font end tag </font>output.write (Fontend.getbytes ()); issize = false;} if (IsColor = = true) {//input span end tag </span>output.write (Spanend.getbytes ()); iscolor = false;} if (Iscenter = = true) {//Input center end tag </center>. To enter the label before the end of the paragraph, because the label forces NewLine//output.write (Centerend.getbytes ()),//iscenter = false;//}if (Isright = True) {//input chunk end tag </div>output.write (Divend.getbytes ()); isright = false;}} break;//end tag Case XmlPullParser.END_TAG:String tagend = Xmlparser.getname (); if (Tagend.equalsignorecase ("TBL")) {// Enter the end of the table tag </table>output.write (tableend.getbytes ()); istable = false;} if (Tagend.equalsignorecase ("tr")) {//Input table row end tag </tr>output.write (Rowend.getbytes ());} if (Tagend.equalsignorecase ("TC")) {//Input table column end tag </td>output.write (Columnend.getbytes ());} if (Tagend.equalsignorecase ("P")) {//Enter paragraph end tag &LT;/P&Gt; If in the table ignores if (istable = = False) {if (Iscenter = = true) {//Input center end tag </center>output.write (Centerend.getbytes ()); Iscenter = false;} Output.write (Lineend.getbytes ());}} Break;default:break;} Event_type = Xmlparser.next ();//Read the next label}output.write (Htmlend.getbytes ()); Output.close (); Htmlarray.add (HtmlPath);}} catch (Exception e) {e.printstacktrace ();}}


Read PDF file Vudroid Way read above to display the pptx file in HTML, although the ability to read pictures and text style, but with the original slide content is still relatively large, the main problems include:
1, ppt in the text is not the same as word is generally arranged up and down, but both up and down the arrangement and left and right arrangement, and according to the relative position of the arrangement. But the simple HTML format can only be arranged up and down, difficult to adapt to other directions of the text layout.
2, PPT usually comes with a slide background, that is, each slide has a background image, but the Slide*.xml file does not parse the background image, and because of the existence of the background map, so that the picture number and slide illustrations do not correspond, resulting in the illustration on the slide page confusion.
3, the size of each PPT is fixed, and the ratio of length and height is constant; but once the HTML format is changed, the page's length-to-width ratio is a mess, not the original layout of the PPT.

If you are on the Java server, you can call the draw method of the Hslfslide class and draw each slide directly to the temporary image file. On the mobile side, however, the draw method cannot be called because the method uses the Java AWT Image Library, and Android does not provide the image library, so poi cannot directly draw the original PPT page.

Since the slide show is hard to implement, there are other ways to think about the idea of converting a PPT file to a PDF file on the server and then reading the PDF file from the phone. One of the many PDF resolutions on the Android platform is the open source framework vudroid, which allows you to read PDF files and print the contents of the PDF file as a list on the screen. Here's how to parse a PDF file using the Vudroid framework:


To integrate the Vudroid framework in an Android project, you can follow these steps:
1, add the operation rights of SD card in Androidmanifest.xml;
2, in the Libs directory to import vudroid so library libvudroid.so; (When using ADT development)
3, in the project source code to import all the source code under the Org.vudroid.pdfdroid package;

Here's the code to parse the PDF file using the Vudroid framework:
public class Vudroidactivity extends Activity implements Onclicklistener, Fileselectcallbacks {private final static Strin G TAG = "vudroidactivity";p rivate framelayout fr_content;private decodeservice decodeservice; @Overrideprotected void OnCreate (Bundle savedinstancestate) {super.oncreate (savedinstancestate); Setcontentview (r.layout.activity_pdf_ vudroid);d Ecodeservice = new Decodeservicebase (new Pdfcontext ()); Findviewbyid (R.id.btn_open). Setonclicklistener ( this); fr_content = (framelayout) Findviewbyid (r.id.fr_content);}        @Overrideprotected void OnDestroy () {decodeservice.recycle (); Decodeservice = Null;super.ondestroy ();} @Overridepublic void OnClick (View v) {if (V.getid () = = R.id.btn_open) {fileselectfragment.show (this, new string[] {"PDF"} , null);}} @Overridepublic void Onconfirmselect (String absolutepath, String fileName, map<string, object> map_param) {string Path = String.Format ("%s/%s", Absolutepath, FileName); LOG.D (TAG, "path=" +path);D ocumentview Documentview = new Documentview (This);d Ocumentview.setlayoutparams (New Viewgroup.layoutparams (ViewGroup.LayoutParams.MATCH_PARENT, ViewGroup.LayoutParams.MATCH_PARENT));d Ecodeservice.setcontentresolver (Getcontentresolver ()); Decodeservice.setcontainerview (Documentview);d ocumentview.setdecodeservice (decodeservice);d Ecodeservice.open ( Uri.fromfile (path)); Fr_content.addview (Documentview);d ocumentview.showdocument ();} @Overridepublic boolean isfilevalid (String Absolutepath, String fileName, map<string, object> Map_param) {return true;}}


Mupdf mode Read Although the Vudroid framework can parse and display the contents of PDF files Normally, the drawback is:
1, vudroid frame resolution speed is slow;
2, display the PDF page using mosaic to display, not friendly;
3, the entire PDF file content is called Draw method drawing, it is difficult to transform the form of page navigation;

Based on the above, bloggers have tried other PDF parsing frameworks, and found that the Mupdf solution is ideal. Mupdf's implementation code is relatively small, it is easier to call, and supports browsing only the specified page, which means that we can use page-flipping form to browse PDF files, more in line with the normal user habits. The following is a list of PDF files that are parsed using the Mupdf framework:


Here is the Mupdf framework for Parsing PDF files (page flipping):


To integrate the Mupdf framework in an Android project, you can follow these steps:
1, add the operation rights of SD card in Androidmanifest.xml;
2, in the Libs directory to import mupdf so library libmupdf.so; (When using ADT development)
3, in the project source code to import all the source code under the Com.artifex.mupdf package;

Here's the code to parse the PDF file using the Mupdf framework:
public class Pdffragment extends Fragment implements Onpdflistener {private static final String TAG = "Pdffragment";p Rotec Ted View mview;protected Context mcontext;private int position;public static pdffragment newinstance (int position) {PDFFR Agment fragment = new Pdffragment (); Bundle bundle = new bundle (); Bundle.putint ("position", position); fragment.setarguments (bundle); return fragment;} @Overridepublic View Oncreateview (layoutinflater inflater, ViewGroup container,bundle savedinstancestate) {Log.d (TAG, "Width=" +container.getmeasuredwidth () + ", height=" +container.getmeasuredheight ()); mcontext = GetActivity (); if ( Getarguments ()! = null) {position = Getarguments (). GetInt ("position");} Mupdfpageview PageView = new Mupdfpageview (Mcontext, Mainapplication.getinstance (). Pdf_core, New Point ( Container.getmeasuredwidth (), Container.getmeasuredheight ())); PointF pageSize = Mainapplication.getinstance (). Page_sizes.get (position); if (pageSize! = null) {Pageview.setpage ( position, pageSize);} else {pageView.blank (position); Mupdfpagetask task = new Mupdfpagetask (Mainapplication.getinstance (). Pdf_core, PageView, position); Task.setpdflistener (this); Task.execute ();} return PageView;} @Overridepublic void OnRead (Mupdfpageview pageView, int position, PointF result) {mainapplication.getinstance (). Page_ Sizes.put (position, result); if (pageview.getpage () = = position) {Pageview.setpage (position, result);}}}


Click here to download the PPT and PDF files used in this articlethe Engineering Code


Click here to view the full list of Android development notes

Android Development Note (141) read PPT and PDF files

Related Article

E-Commerce Solutions

Leverage the same tools powering the Alibaba Ecosystem

Learn more >

Apsara Conference 2019

The Rise of Data Intelligence, September 25th - 27th, Hangzhou, China

Learn more >

Alibaba Cloud Free Trial

Learn and experience the power of Alibaba Cloud with a free trial worth $300-1200 USD

Learn more >

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.