Jsoup Study notes 9:jsoup parse saz file, read the HTM file into the string, extract the data from the string into the CSV file

Source: Internet
Author: User
Tags readfile

This note will be the operation of the previous note to make some improvements, no longer the HTM file in the Saz file parsing out, but not to read the data directly into the string, the basic idea is as follows:

1. Customize a class that reads content from a text file to a string: Parses an HTM document in a saz file and reads the contents of the file into a string

2. Customize the class that uses Jsoup to parse the HTM string: Parse the incoming HTM string with Jsoup and write the results to a CSV file

3, when parsing, specify a good file path, directly call the above two tool classes can be

The sample code is as follows:

Package com.daxiang.saztest;/** * Customizing a class that reads content from a text file to a string * @author Daxiang * @date 2015-7-23 * @Description: Parses the HTM document in the Saz file, Reads the contents of a file into a string */import java.io.bufferedreader;import java.io.file;import java.io.ioexception;import Java.io.inputstreamreader;import Java.util.zip.zipentry;import Java.util.zip.zipfile;public class ReadSazToString { /** * Method of reading saz file * @param buffer buffer * @param filePath file path * @throws IOException exception */public static void Readt Obuffer (stringbuffer buffer, String filePath) throws IOException {File File = new file (filePath);//Find compressed file ZipFile ZipFile = new ZipFile (file); Instantiate the ZipFile object System.out.println ("saz file name to parse:" + zipfile.getname ()); Get the name of the saz file zipentry ze = zipfile.getentry ("_index.htm"); Get the compressed entity _index.htmsystem.out.println in the Saz file ("file-" + ze.getname () + ":" + ze.getsize () + "bytes"); Long size = Ze.ge Tsize (); if (Size > 0) {bufferedreader reader = new BufferedReader (New InputStreamReader (Zipfile.getinputstream (ze))) ; String Line; Used to hold the inside of each line readTolerance line = Reader.readline (); Read the first row while (line! = null) {//If line is empty instructions read buffer.append (line);//Add read content to buffer buffer.append ("\ n");//Add newline character Li NE = Reader.readline (); Reads the next line of}reader.close ();} Zipfile.close ();} /** * Method of reading the contents of a file to a string */public static string ReadFile (String filePath) throws IOException {StringBuffer sb = new Stringbuff ER (); Readsaztostring.readtobuffer (SB, FilePath); return sb.tostring ();}}
Package com.daxiang.saztest;/** * Custom classes that use Jsoup to parse an HTM string * @author Daxiang * @date 2015-7-23 * @Description: Parse the incoming HTM with Jsoup String that writes the result of the parse to the CSV file */import java.io.filewriter;import java.io.ioexception;import org.jsoup.jsoup;import      Org.jsoup.nodes.document;import Org.jsoup.select.elements;public class Jsoupstrtocsv {/** * parses the HTM document and writes parsing data to a CSV file * @param htmstr incoming htm string * @param csvpath generated CSV file path * @throws IOException Exception */public static void Jsou PSTR (String htmstr, String csvpath) throws IOException {//import parsed out htm document string doc = Jsoup.parse (HTMSTR);// Extract header information Elements heads = Doc.getelementsbytag ("table"). Select ("Thead"); FileWriter FW = new FileWriter (Csvpath); for (int m = 0; m < heads.size (); m++) {Elements head = Heads.get (M). Select ("th "); for (int n = 0; N < head.size (); n++) {//observed, the HTM document has a total of 23 columns of data in the table label because only the 2nd to 12th column of data is required, so the other columns are excluded from fetching if (n = = 0) {continue;} else if (n <) {//Text () method and to The effect of the string () method is different//string h = Head.get (n). toString () + "\ r \ n"; Stringh = Head.get (n). Text () + ","; Fw.write (h);//system.out.print (h);} else if (n = =) {String h = head.get (n). Text () + ' \ r \ n '; Fw.write (h);//system.out.print (h);} else {break;}}}  Extract table information Elements TRS = Doc.getelementsbytag ("table"). Select ("tr"); for (int i = 0; i < trs.size (); i++) {Elements TDS = Trs.get (i). Select ("TD"), for (int j = 0; J < Tds.size (), j + +) {///To remove the blank space without content//if (! "". Equals (Tds.get (j). Text ())) {//String str = Tds.get (j). Text () + "\ r \ n";//System.out.print (str);//Fw.write (str);//}if (  j = = 0) {continue;} else if (J < 22) {//double quotation marks do a great deal, some data contain commas, do not process, the data in a table will be separated into several columns because of the existence of commas string str1 = "\" "+ Tds.get (j). Text () + "\" "; String str2 = str1 + ",";//System.out.print (STR2); Fw.write (STR2);} else if (j = =) {String STR3 = Tds.get (j). Text () + "\ r \ n";//System.out.print (STR3); Fw.write (STR3);} else {break;}} Fw.flush (); Fw.close ();}}
Package com.daxiang.saztest;/** * @Author: Elephant Jepson * @Date: 2015-7-23 * @Email: [Email protected] * @Version: Version1.0 * @CopyRight: Elephant Jepson * @Description: Parse saz file, do not understand the pressure, directly read the HTM file into the string, *   using Jsoup to parse the generated HTM file, extract the data within the table tag, Writes the result of the parse to the CSV file */public class Sazjsouptest {static String URL = "D:\\daxiang\\saztest\\21316.saz";p ublic static void Main ( String[] args) throws Exception {//define the path that holds the generated CSV file string csvpath = "d:\\daxiang\\saztest\\21316.csv";// Call the ReadFile method in the Readfiletostring class string htmstr = Readsaztostring.readfile (URL);// Call the Jaoupstr method in the Jsoupstrtocsv class Jsoupstrtocsv.jsoupstr (Htmstr, Csvpath); System.out.println ("Saz file parsing completed!");}  }




Copyright NOTICE: This article for Bo Master original article, without Bo Master permission not reproduced.

Jsoup Study notes 9:jsoup parse saz file, read the HTM file into the string, extract the data from the string into the CSV file

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.