Super CSV Line Cheng Concurrent processing of large volumes of data

Source: Internet
Author: User
Tags getzip

Super CSV is a Java Open source project for working with CSV files. It is designed entirely around object-oriented thinking, so you can use your object-oriented code to make it easier to work with CSV files. It supports input/output type conversions, data integrity checks, and supports reading and writing data from anywhere with any encoding, as long as the corresponding reader and writer objects are provided. Configurable delimiters, space and line terminators, and so on.

First, take a look at the simple data processing
To introduce a dependency package:

<dependency>    <groupId>net.sf.supercsv</groupId>    <artifactId>super-csv</artifactId>    <version>2.4.0</version></dependency>

Here's a look at the code examples in the official documentation.

  1. Read CSV files based on headers
    Read each line of the file into a Java object, assuming you have a Userbean class with the following code:

      public class UserBean {int id;          String username, password, street, town;          int zip;        public int getId () {return ID;}          Public String GetPassword () {return password;}          Public String Getstreet () {return street;}          Public String Gettown () {return town;}          Public String GetUserName () {return username;}          public int Getzip () {return zip;}          public void setId (int id) { = ID;}          public void SetPassword (String password) {this.password = password;}          public void Setstreet (String street) {this.street = Street;}          public void Settown (String town) { = town;}          public void Setusername (String username) {this.username = username;}      public void Setzip (int zip) { = zip;} }  

    And there is a CSV file that contains a file header, assuming the file contents are as follows:
    1,klaus,qwexykiks,17/1/2007,1111,new York
    2,oufud,bobilop213,10/10/2007,4555,new York
    3,oufud1,bobilop213,10/10/2007,4555,new York
    4,oufud2,bobilop213,10/10/2007,4555,new York
    5,oufud3,bobilop213,10/10/2007,4555,new York
    6,oufud4,bobilop213,10/10/2007,4555,new York
    7,oufud5,bobilop213,10/10/2007,4555,new York
    8,oufud6,bobilop213,10/10/2007,4555,new York
    9,oufud7,bobilop213,10/10/2007,4555,new York
    10,oufud8,bobilop213,10/10/2007,4555,new York
    11,oufud9,bobilop213,10/10/2007,4555,new York
    12,oufud10,bobilop213,10/10/2007,4555,new York
    13,oufud11,bobilop213,10/10/2007,4555,new York
    14,oufud12,bobilop213,10/10/2007,4555,new York
    15,oufud13,bobilop213,10/10/2007,4555,new York

    You can then use the code to create an instance object of Userbean and print out the object's property values:

    class ReadingObjects {      public static void main(String[] args) throws Exception{          ICsvBeanReader inFile = new CsvBeanReader(new FileReader("foo.csv"), CsvPreference.STANDARD_PREFERENCE);          try {              final String[] header = inFile.getCSVHeader(true);              UserBean user;              while( (user =, header, processors)) != null) {                  System.out.println(user.getZip());              }          } finally {              inFile.close();          }      }  }  

    We still have processors. There is no definition, by name we can see is the parser, used to process each column of data, of course you can also pass NULL, indicating that the column does not do special processing, each parser can be included in the other internal, new Unique (New Strminmax ( 5,20)), the value of this column is unique, and the length is 8 to 20, the specific processing details we do not speak first, to see how we need the processors is defined:

    final CellProcessor[] processors = new CellProcessor[] {      new Unique(new ParseInt()),    new Unique(new StrMinMax(5, 20)),      new StrMinMax(8, 35),      new ParseDate("dd/MM/yyyy"),      new Optional(new ParseInt()),      null  };  

    The specific meaning of the above code is:
    The first column is a string, and the value is unique, with a length of 5 to 20
    The second column is a string with a length of 8 to 35
    The third column is a date type, formatted as day/month/year (day/month/year)
    The fourth column is an integer number, but only when the column has a value does the parseint processor handle the value (which is, in fact, the column can be empty)
    Column Five is a string (default) and does not use the processor

If your CSV file does not have a header, you can also define arrays instead:

If you want to ignore a column, like defining a processor, use null directly in the header array.

The full code is as follows:

Import;  Import;  Import org.supercsv.cellprocessor.Optional;  Import Org.supercsv.cellprocessor.ParseDate;  Import org.supercsv.cellprocessor.ParseInt;  Import Org.supercsv.cellprocessor.constraint.StrMinMax;  Import Org.supercsv.cellprocessor.constraint.Unique;  Import Org.supercsv.cellprocessor.ift.CellProcessor;  Import;  Import;  Import org.supercsv.prefs.CsvPreference; Class Readingobjects {static final cellprocessor[] userprocessors = new cellprocessor[] {new Unique (NE W parseint ()), New Unique (New Strminmax (5)), New Strminmax (8, +), new Parsedate ("Dd/mm/yyyy"      ), New Optional (New parseint ()), NULL}; public static void Main (string[] args) throws Exception {Icsvbeanreader inFile = new Csvbeanreader (New Filereade          R ("Foo.csv"), csvpreference.standard_preference); try {final StriNg[] Header = Infile.getcsvheader (true);            UserBean user; while (user = (userbean.class, header, userprocessors)) = null) {SYSTEM.OUT.PRINTLN (user.getzip            ());          }} finally {Infile.close ();      }}} public class UserBean {String username, password, town;      Date date;      int zip;      Public Date getDate () {return date;      } public String GetPassword () {return password;      } public String Gettown () {return town;      } public String GetUserName () {return username;      } public int Getzip () {return zip;      public void SetDate (final date date) { = date;      The public void SetPassword (final String password) {this.password = password;      public void Settown (final String town) { = town;     } public void Setusername (final String username) {     This.username = Username;      } public void Setzip (final int zip) { = zip;   }  }

If you don't know the exact format of the file before you read the file, you can choose the () method to put the data read out of each line in a list.

Read the code of the file we see, the following look at the written operation, is also very simple.

Import Java.util.HashMap;  Import*;  Import org.supercsv.prefs.CsvPreference; Class Writingmaps {main (string[] args) throws Exception {Icsvmapwriter writer = new Csvmapwriter (New FileWriter (      ...), csvpreference.standard_preference);        try {final string[] header = new string[] {"Name", "City", "zip"}; Set up some data to write final hashmap<string,?        Super Object> data1 = new hashmap<string, object> ();        Data1.put (Header[0], "Karl");        Data1.put (Header[1], "Tent City");        Data1.put (header[2], 5565); Final hashmap<string, huh?        Super object> data2 = new hashmap<string, object> ();        Data2.put (Header[0], "Banjo");        Data2.put (Header[1], "river Side");        Data2.put (header[2], 5551);        The actual writing writer.writeheader (header);        Writer.write (data1, header);      Writer.write (data2, header);      } finally {writer.close ();   }    }  }

second, concurrent batch processing of data updates for large amounts of data
code is as follows

Import Org.supercsv.cellprocessor.optional;import Org.supercsv.cellprocessor.parsedate;import Org.supercsv.cellprocessor.parseint;import Org.supercsv.cellprocessor.constraint.strminmax;import Org.supercsv.cellprocessor.constraint.unique;import Org.supercsv.cellprocessor.ift.cellprocessor;import;import;import org.supercsv.prefs.CsvPreference; Import;import;import Java.util.arraylist;import Java.util.List;import Java.util.concurrent.callable;import Java.util.concurrent.executorservice;import java.util.concurrent.Executors; Import Java.util.concurrent.future;class threadreadingobjects {static final cellprocessor[] Userprocessors = new CellP            Rocessor[] {New Unique (new parseint ()),//unique, int ID new unique (new Strminmax (5, 20)),//unique, length 5 to 20 New Strminmax (8, 35),//length 8 to Parsedate ("dd/mm/yyyy"),//Format day/month/year (day/month/year) n EW OptionAl (New parseint ()),//integer number, but only if the column has a value parseint processor will handle this value (in fact, the column can be empty) NULL//Do not use the processor}; public static void Main (string[] args) throws Exception {//InputStreamReader Freader = new InputStreamReader (Inpu        TStream, "UTF-8");        Icsvbeanreader inFile = new Csvbeanreader (Freader, csvpreference.standard_preference);        Icsvbeanreader inFile = new Csvbeanreader (New FileReader ("D:\\foo.csv"), csvpreference.standard_preference);        Executorservice executorservice = null;  try {//If your CSV file does not have a header, you can also define arrays instead://FINAL string[] Header = new string[] {"username", "password",            "Date", "Zip", "Town"};            Final string[] Header = Infile.getheader (true);            Create a cache thread pool list<future<string>> futurelist = new arraylist<future<string>> ();            Executorservice = Executors.newcachedthreadpool (); After paging the data, join the thread pool to process while (Getpageuserlist (Executorservice,futurelist,infile, HEader)) {}//Get thread processing result for (future<string> future:futurelist) {WH Ile (True) {if (Future.isdone () &&!future.iscancelled ()) {Sys                            Tem.out.println ("Future Result:" +future.get ());                        Break                        } else {thread.sleep (1000);            }}}} finally {Infile.close ();        Executorservice.shutdown (); }} private static Boolean getpageuserlist (Executorservice executorservice, list<future<string>> FutureL        ist, Icsvbeanreader inFile, string[] header) throws IOException {int index = 0;        Boolean status = FALSE;        list<userbean> Userbeans = new arraylist<userbean> ();        UserBean user;    while (user = (userbean.class, header, userprocessors))! = NULL) {//Here to start fetching data from the first row        Userbeans.add (user);            index++;                Number of rows read, number of records processed per thread, modify if (index = =) {status = True, as appropriate)            Break }}//Add to Thread collection if (!userbeans.isempty ()) {future<string> future = EXECUTORSERVICE.SUBM            It (Getupdatedbjob (Futurelist.size (), Userbeans));        Futurelist.add (future);    } return status;  } private static callable<string> getupdatedbjob (int threadno,list<userbean> userbeans) {return new Callable<string> () {@Override public String call () throws Exception {//Sub-batch Write                Database list<userbean> userlist = new arraylist<userbean> ();                    for (int i=0;i<userbeans.size (); i++) {Userlist.add (Userbeans.get (i));  If the amount of data is large again in batches commit, the first commit 3, after each commit 2//take% of the number according to the actual situation to modify if (i > 0 && I 3 ==0) {System.out.println ("thread" +threadno+ "Update User:" +userlist.size () + "success"); Use jdbctemplate Batch Write database//TODO Write Data userlist = new arraylist<userbean&                    gt; (); } else if (i = = Userbeans.size ()-1) {//handles the last batch of data submitted System.out.println ("thread"                        +threadno+ "Update User:" +userlist.size () + "success");                    TODO writes data in userlist = new arraylist<userbean> ();            }} return String.valueof (Userbeans.size ());    }        }; }}

Return results after running:

线程0更新用户:4 个成功线程0更新用户:3 个成功线程0更新用户:3 个成功线程1更新用户:4 个成功线程1更新用户:1 个成功future result: 10future result: 5

Super CSV Line Cheng Concurrent processing of large volumes of data

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.