Classification:Background Development
Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.
There are many online preview scenarios for OFFICE documents:
- The server is converted to PDF, converted to SWF, and finally loaded with flash previews via web pages, such as Flexpaper
- Office documents are converted directly to SWF, and flash previews are loaded via web pages
- Microsoft's Office365
- Open directly in the browser
- Convert to HTML
Today, we are going to use a scheme that is converted to HTML to preview.
Technical Solutions:
-Convert Office documents to PDF: using LibreOffice
-PDF to HTML, using Pdf2htmlex
Test environment:
Operating system: Ubuntu 12.04
1. Office document to pdf1.1 installation LibreOffice
apt-get install libreoffice-common
1.2 Start the conversion service
soffice --accept="socket,host=127.0.0.1,port=2002;urp;" --nofirststartwizard
1.3 Document Conversions
The general introduction of the Internet is to use Jodconverter, through Java to invoke the transformation service, there is an alternative, is to use the Pyodconverter,python version of the conversion script Https://github.com/mirkonasato /pyodconverter/
After downloading, test:
python DocumentConverter.py Website_Information_Form.doc new.pdf
Soffice can be converted to PDF normally, but Chinese is garbled
1.4 Chinese garbled problem fix
Google, garbled problem is probably the problem of missing fonts, so try to copy the fonts under Windows, copy the fonts under Windows to/usr/share/fonts, and then refresh the font cache
sudo fc-cache -fv 2>&1 | grep failed | cut -f1 -d":" | xargs -i sudo touch {} && sudo fc-cache -fv
Restart the conversion service again, test again, already OK!
2.pdf Turn HTML
PDF to HTML, using the Chinese open source Pdf2htmlex, the first attempt to compile the installation through the source code, relying on too many components, installation is very troublesome! Interested can perform a compilation installation, see (https://github.com/coolwanglu/pdf2htmlEX/wiki/Building)
The following describes the simple installation method:
2.1 Installation via Apt
sudo add-apt-repository ppa:coolwanglu/pdf2htmlexsudo apt-get updatesudo apt-get install pdf2htmlex
2.2 Test Pdf2htmlex
1.3 xiaoshujiang.pdf
As you can see, the current directory generates a xiaoshujiang.html
2.3 Conversion Scripts
Write a script that puts two conversions together and makes it easy to invoke:
#convert2html.shtemp=$(date +%Y%m%d%H%m%s)python DocumentConverter.py $1 ${temp}.pdfpdf2htmlEX --zoom 1.3 $temp.pdfmv $temp.html $2rm $temp.pdf
3. Test 3.1 Word (doc.docx) test
convert2html.sh imo云办公室-私有云用户使用手册V2.0.doc imo-doc.html
Effect:
3.2 form (XLS) test
convert2html.sh xxx.xlsx xxx.html
3.3 PPT (pptx) test
./convert.sh xxx.pptx xxx.html
4. Summary
This article describes a way to convert Office documents to HTML on the server side for easy previewing.
When actually used, you can put the generated HTML into the Web site path, through the interceptor, set access permissions.
How to transfer Office, HTML, and PDF documents under the shell