Super CSV Line Cheng Concurrent processing of large volumes of data

Last Update:2018-04-17 Source: Internet

Author: User

Tags getzip

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Super CSV is a Java Open source project for working with CSV files. It is designed entirely around object-oriented thinking, so you can use your object-oriented code to make it easier to work with CSV files. It supports input/output type conversions, data integrity checks, and supports reading and writing data from anywhere with any encoding, as long as the corresponding reader and writer objects are provided. Configurable delimiters, space and line terminators, and so on.

First, take a look at the simple data processing
To introduce a dependency package:

<dependency>    <groupId>net.sf.supercsv</groupId>    <artifactId>super-csv</artifactId>    <version>2.4.0</version></dependency>

Here's a look at the code examples in the official documentation.

Read CSV files based on headers
Read each line of the file into a Java object, assuming you have a Userbean class with the following code:

  public class UserBean {int id;          String username, password, street, town;          int zip;        public int getId () {return ID;}          Public String GetPassword () {return password;}          Public String Getstreet () {return street;}          Public String Gettown () {return town;}          Public String GetUserName () {return username;}          public int Getzip () {return zip;}          public void setId (int id) {this.id = ID;}          public void SetPassword (String password) {this.password = password;}          public void Setstreet (String street) {this.street = Street;}          public void Settown (String town) {this.town = town;}          public void Setusername (String username) {this.username = username;}      public void Setzip (int zip) {this.zip = zip;} }

And there is a CSV file that contains a file header, assuming the file contents are as follows:
Id,username,password,date,zip,town
1,klaus,qwexykiks,17/1/2007,1111,new York
2,oufud,bobilop213,10/10/2007,4555,new York
3,oufud1,bobilop213,10/10/2007,4555,new York
4,oufud2,bobilop213,10/10/2007,4555,new York
5,oufud3,bobilop213,10/10/2007,4555,new York
6,oufud4,bobilop213,10/10/2007,4555,new York
7,oufud5,bobilop213,10/10/2007,4555,new York
8,oufud6,bobilop213,10/10/2007,4555,new York
9,oufud7,bobilop213,10/10/2007,4555,new York
10,oufud8,bobilop213,10/10/2007,4555,new York
11,oufud9,bobilop213,10/10/2007,4555,new York
12,oufud10,bobilop213,10/10/2007,4555,new York
13,oufud11,bobilop213,10/10/2007,4555,new York
14,oufud12,bobilop213,10/10/2007,4555,new York
15,oufud13,bobilop213,10/10/2007,4555,new York

You can then use the code to create an instance object of Userbean and print out the object's property values:

class ReadingObjects {      public static void main(String[] args) throws Exception{          ICsvBeanReader inFile = new CsvBeanReader(new FileReader("foo.csv"), CsvPreference.STANDARD_PREFERENCE);          try {              final String[] header = inFile.getCSVHeader(true);              UserBean user;              while( (user = inFile.read(UserBean.class, header, processors)) != null) {                  System.out.println(user.getZip());              }          } finally {              inFile.close();          }      }  }

We still have processors. There is no definition, by name we can see is the parser, used to process each column of data, of course you can also pass NULL, indicating that the column does not do special processing, each parser can be included in the other internal, new Unique (New Strminmax ( 5,20)), the value of this column is unique, and the length is 8 to 20, the specific processing details we do not speak first, to see how we need the processors is defined:

final CellProcessor[] processors = new CellProcessor[] {      new Unique(new ParseInt()),    new Unique(new StrMinMax(5, 20)),      new StrMinMax(8, 35),      new ParseDate("dd/MM/yyyy"),      new Optional(new ParseInt()),      null  };

The specific meaning of the above code is:
The first column is a string, and the value is unique, with a length of 5 to 20
The second column is a string with a length of 8 to 35
The third column is a date type, formatted as day/month/year (day/month/year)
The fourth column is an integer number, but only when the column has a value does the parseint processor handle the value (which is, in fact, the column can be empty)
Column Five is a string (default) and does not use the processor

If your CSV file does not have a header, you can also define arrays instead:

If you want to ignore a column, like defining a processor, use null directly in the header array.

The full code is as follows:

Import Java.io.FileReader;  Import Java.io.IOException;  Import org.supercsv.cellprocessor.Optional;  Import Org.supercsv.cellprocessor.ParseDate;  Import org.supercsv.cellprocessor.ParseInt;  Import Org.supercsv.cellprocessor.constraint.StrMinMax;  Import Org.supercsv.cellprocessor.constraint.Unique;  Import Org.supercsv.cellprocessor.ift.CellProcessor;  Import Org.supercsv.io.CsvBeanReader;  Import Org.supercsv.io.ICsvBeanReader;  Import org.supercsv.prefs.CsvPreference; Class Readingobjects {static final cellprocessor[] userprocessors = new cellprocessor[] {new Unique (NE W parseint ()), New Unique (New Strminmax (5)), New Strminmax (8, +), new Parsedate ("Dd/mm/yyyy"      ), New Optional (New parseint ()), NULL}; public static void Main (string[] args) throws Exception {Icsvbeanreader inFile = new Csvbeanreader (New Filereade          R ("Foo.csv"), csvpreference.standard_preference); try {final StriNg[] Header = Infile.getcsvheader (true);            UserBean user; while (user = Infile.read (userbean.class, header, userprocessors)) = null) {SYSTEM.OUT.PRINTLN (user.getzip            ());          }} finally {Infile.close ();      }}} public class UserBean {String username, password, town;      Date date;      int zip;      Public Date getDate () {return date;      } public String GetPassword () {return password;      } public String Gettown () {return town;      } public String GetUserName () {return username;      } public int Getzip () {return zip;      public void SetDate (final date date) {this.date = date;      The public void SetPassword (final String password) {this.password = password;      public void Settown (final String town) {this.town = town;     } public void Setusername (final String username) {     This.username = Username;      } public void Setzip (final int zip) {this.zip = zip;   }  }

If you don't know the exact format of the file before you read the file, you can choose the Csvlistreader.read () method to put the data read out of each line in a list.

Read the code of the file we see, the following look at the written operation, is also very simple.

Import Java.util.HashMap;  Import org.supercsv.io.*;  Import org.supercsv.prefs.CsvPreference; Class Writingmaps {main (string[] args) throws Exception {Icsvmapwriter writer = new Csvmapwriter (New FileWriter (      ...), csvpreference.standard_preference);        try {final string[] header = new string[] {"Name", "City", "zip"}; Set up some data to write final hashmap<string,?        Super Object> data1 = new hashmap<string, object> ();        Data1.put (Header[0], "Karl");        Data1.put (Header[1], "Tent City");        Data1.put (header[2], 5565); Final hashmap<string, huh?        Super object> data2 = new hashmap<string, object> ();        Data2.put (Header[0], "Banjo");        Data2.put (Header[1], "river Side");        Data2.put (header[2], 5551);        The actual writing writer.writeheader (header);        Writer.write (data1, header);      Writer.write (data2, header);      } finally {writer.close ();   }    }  }

second, concurrent batch processing of data updates for large amounts of data
code is as follows

Import Org.supercsv.cellprocessor.optional;import Org.supercsv.cellprocessor.parsedate;import Org.supercsv.cellprocessor.parseint;import Org.supercsv.cellprocessor.constraint.strminmax;import Org.supercsv.cellprocessor.constraint.unique;import Org.supercsv.cellprocessor.ift.cellprocessor;import Org.supercsv.io.csvbeanreader;import Org.supercsv.io.icsvbeanreader;import org.supercsv.prefs.CsvPreference; Import Java.io.filereader;import java.io.ioexception;import Java.util.arraylist;import Java.util.List;import Java.util.concurrent.callable;import Java.util.concurrent.executorservice;import java.util.concurrent.Executors; Import Java.util.concurrent.future;class threadreadingobjects {static final cellprocessor[] Userprocessors = new CellP            Rocessor[] {New Unique (new parseint ()),//unique, int ID new unique (new Strminmax (5, 20)),//unique, length 5 to 20 New Strminmax (8, 35),//length 8 to Parsedate ("dd/mm/yyyy"),//Format day/month/year (day/month/year) n EW OptionAl (New parseint ()),//integer number, but only if the column has a value parseint processor will handle this value (in fact, the column can be empty) NULL//Do not use the processor}; public static void Main (string[] args) throws Exception {//InputStreamReader Freader = new InputStreamReader (Inpu        TStream, "UTF-8");        Icsvbeanreader inFile = new Csvbeanreader (Freader, csvpreference.standard_preference);        Icsvbeanreader inFile = new Csvbeanreader (New FileReader ("D:\\foo.csv"), csvpreference.standard_preference);        Executorservice executorservice = null;  try {//If your CSV file does not have a header, you can also define arrays instead://FINAL string[] Header = new string[] {"username", "password",            "Date", "Zip", "Town"};            Final string[] Header = Infile.getheader (true);            Create a cache thread pool list<future<string>> futurelist = new arraylist<future<string>> ();            Executorservice = Executors.newcachedthreadpool (); After paging the data, join the thread pool to process while (Getpageuserlist (Executorservice,futurelist,infile, HEader)) {}//Get thread processing result for (future<string> future:futurelist) {WH Ile (True) {if (Future.isdone () &&!future.iscancelled ()) {Sys                            Tem.out.println ("Future Result:" +future.get ());                        Break                        } else {thread.sleep (1000);            }}}} finally {Infile.close ();        Executorservice.shutdown (); }} private static Boolean getpageuserlist (Executorservice executorservice, list<future<string>> FutureL        ist, Icsvbeanreader inFile, string[] header) throws IOException {int index = 0;        Boolean status = FALSE;        list<userbean> Userbeans = new arraylist<userbean> ();        UserBean user;    while (user = Infile.read (userbean.class, header, userprocessors))! = NULL) {//Here to start fetching data from the first row        Userbeans.add (user);            index++;                Number of rows read, number of records processed per thread, modify if (index = =) {status = True, as appropriate)            Break }}//Add to Thread collection if (!userbeans.isempty ()) {future<string> future = EXECUTORSERVICE.SUBM            It (Getupdatedbjob (Futurelist.size (), Userbeans));        Futurelist.add (future);    } return status;  } private static callable<string> getupdatedbjob (int threadno,list<userbean> userbeans) {return new Callable<string> () {@Override public String call () throws Exception {//Sub-batch Write                Database list<userbean> userlist = new arraylist<userbean> ();                    for (int i=0;i<userbeans.size (); i++) {Userlist.add (Userbeans.get (i));  If the amount of data is large again in batches commit, the first commit 3, after each commit 2//take% of the number according to the actual situation to modify if (i > 0 && I 3 ==0) {System.out.println ("thread" +threadno+ "Update User:" +userlist.size () + "success"); Use jdbctemplate Batch Write database//TODO Write Data userlist = new arraylist<userbean&                    gt; (); } else if (i = = Userbeans.size ()-1) {//handles the last batch of data submitted System.out.println ("thread"                        +threadno+ "Update User:" +userlist.size () + "success");                    TODO writes data in userlist = new arraylist<userbean> ();            }} return String.valueof (Userbeans.size ());    }        }; }}

Return results after running:

线程0更新用户：4 个成功线程0更新用户：3 个成功线程0更新用户：3 个成功线程1更新用户：4 个成功线程1更新用户：1 个成功future result: 10future result: 5

Super CSV Line Cheng Concurrent processing of large volumes of data

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More