Univocity-parsers: A powerful csv/tsv/fixed-width text file parsing library (Java)

Source: Internet
Author: User
Tags comparison table maven central apache camel

Univocity-parsers is an open-source Java project. For csv/tsv/fixed-width text file parsing, it provides a rich and powerful function with the Simple API development interface. Further introductions will be made later.

Unlike other analytic libraries, Univocity-parsers has designed a set of own architectures based on high performance and scalability. Based on this architecture, developers can build a new file parser.

1. Overview

As a Java developer, I am currently involved in developing a Web project that helps communication operators evaluate the current network and provide solutions. In this project, the CSV file plays a critical role, which is the bearer format of the carrier's network data, which contains the real-time online status of the broadband user (online/offline) and its real-time traffic. Generally speaking, a single CSV file can reach more than 1GB, including the millions record. The CSV parsing library currently in use by the project is javacsv.

With the expansion of operator network and the increase of system monitoring cycle, CSV file becomes larger quickly. The project team had to address the performance issues (even the second-level parsing efficiency) of the large CSV data parsing, as well as the limitations of the expansion of the functionality brought about by business changes.

After many tests and analyses, we finally decided to use Univocity-parsers as the CSV file Parsing library, and then found out that it did solve our problem. In addition to better performance and scalability, the library provides developers with easy-to-understand APIs, development documentation, and tutorials. For the Advanced Function expansion Appeal, the official provides the corresponding charge service.

The project is hosted on GitHub and, as of now, has 69 star and 10 fork. You can find relevant development documentation and tutorials here and here, as well as more examples and news here.

It is worth noting that Apache Camel, a well-known open source project in the Univocity-parsers community, also integrates the csv/tsv/as a recommended repository for the project's resolution of fixed-width text files. For more information, please see here.

2. Installation

Our project team is currently using the 1.5.1 version, recommended to go to the Univocity-parsers official website to download the latest version.

The project is also posted in the MAVEN central repository, so you can also add the following code directly to your pom.xml:

<dependency><groupid>com.univocity</groupid><artifactid>univocity-parsers</ Artifactid><version>1.5.1</version></dependency>

3. Introduction to Features

Univocity-parsers offers a range of powerful features that are well-prepared to meet all of your processing needs for list-type data. The table shows some of the key features:

650) this.width=650; "src=" Http://images.cnitblog.com/blog2015/304630/201505/052339591266695.png "style=" border:0 Px;font-family:verdana, Arial, Helvetica, Sans-serif;font-size:14px;line-height:21px;white-space:normal; Background-color:rgb (255,255,255); "/>


4. Reading list-style data

Read all rows in the CSV

Csvparser parser = new Csvparser (new Csvparsersettings ()); list<string[]> allrows = Parser.parseall (Getreader ("/examples/example.csv"));

To view all the functions associated with file writing, please visit: https://github.com/uniVocity/univocity-parsers#reading-csv


5. Write list-style data

With just 2 lines of code, you can write data in CSV format:

Csvwriter writer = new Csvwriter (Outputwriter, New Csvwritersettings ()); Writer.writerowsandclose (rows);

to view all the functions associated with file writing, please visit: https://github.com/uniVocity/univocity-parsers/blob/master/README.md#writing


6. Performance and Scalability

Here is a comparison table comparing the Univocity-parsers and javacsv:

File size Javacsv parsing Time-consuming Nivocity-parsers parsing Time-consuming
10MB, 145453 rows 1138ms 1138ms
100MB, 809008 rows 031 6s
434MB, 4499959 rows 91s 081
1GB, 23803502 rows 245s 70s


7. Design and implementation

In Univocity-parsers, there are some core data-processing modules that are responsible for reading and writing data by line, reading and writing columns, and converting row and column data. Here is the diagram of these core modules:

650) this.width=650; "src=" Http://images.cnitblog.com/blog2015/304630/201505/052340335951999.png "style=" border:0 Px;font-family:verdana, Arial, Helvetica, Sans-serif;font-size:14px;line-height:21px;white-space:normal; Background-color:rgb (255,255,255); "/>


You can develop your own data processing module by implementing the Rowprocessor interface or inheriting its implementation class. In the following code, I developed my own data processing module through a simple internal anonymous class.


Csvparsersettings settings = new csvparsersettings (); Settings.setrowprocessor (new  Rowprocessor ()  {stringbuilder stringbuilder = new stringbuilder ();/***  before processing the first row of data , you can do the related initialization configuration according to the business logic. **/@Overridepublic  void processstarted (parsingcontext context)  {system.out.println (" Started to process rows of data. "); /***  handles the row data **/@Overridepublic  void rowprocessed based on your business logic (string[] row, parsingcontext  context)  {system.out.println ("the row in line #"  +  Context.currentline ()  +  ": ");for  (String col : row)  { Stringbuilder.append (COL). Append ("\ t");}} /***  after all rows have been processed, do cleanup work. **/@Overridepublic  void processended (parsingcontext context)  {system.out.println ("Finished  processing rows of data. "); System.out.println (StringBuilder);}}); Csvparser parser = new&nBsp Csvparser (settings); List<string[]> allrows = parser.parseall (New filereader ("/myFile.csv"));

The Univocity-parsers Library offers more than that, as it plays a big role in our projects and recommends that you learn more.

Univocity-parsers: A powerful csv/tsv/fixed-width text file parsing library (Java)

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.