Univocity-parsers: A powerful csv/tsv/fixed-width text file parsing library (Java)

Source: Internet
Author: User
Tags comparison table maven central apache camel

Univocity-parsers is an open-source Java project. For csv/tsv/fixed-width text file parsing, it provides a rich and powerful function with the Simple API development interface. Further introductions will be made later.

Unlike other analytic libraries, Univocity-parsers has designed a set of own architectures based on high performance and scalability. Based on this architecture, developers can build a new file parser.

1. Overview

As a Java developer, I am currently involved in developing a Web project that helps communication operators evaluate the current network and provide solutions. In this project, the CSV file plays a critical role, which is the bearer format of the carrier's network data, which contains the real-time online status of the broadband user (online/offline) and its real-time traffic. Generally speaking, a single CSV file can reach more than 1GB, including the millions record. The CSV parsing library currently in use by the project is javacsv.

With the expansion of operator network and the increase of system monitoring cycle, CSV file becomes larger quickly. The project team had to address the performance issues (even the second-level parsing efficiency) of the large CSV data parsing, as well as the limitations of the expansion of the functionality brought about by business changes.

After many tests and analyses, we finally decided to use Univocity-parsers as the CSV file Parsing library, and then found out that it did solve our problem. In addition to better performance and scalability, the library provides developers with easy-to-understand APIs, development documentation, and tutorials. For the Advanced Function expansion Appeal, the official provides the corresponding charge service.

The project is hosted on GitHub and, as of now, has 69 star and 10 fork. You can find relevant development documentation and tutorials here and here, as well as more examples and news here.

It is worth noting that Apache Camel, a well-known open source project in the Univocity-parsers community, also integrates the csv/tsv/as a recommended repository for the project's resolution of fixed-width text files. For more information, please see here.

2. Installation

Our project team is currently using the 1.5.1 version, recommended to go to the Univocity-parsers official website to download the latest version.

The project is also posted in the MAVEN central repository, so you can also add the following code directly to your pom.xml:

<Dependency><groupId>Com.univocity</groupId><Artifactid>Univocity-parsers</Artifactid><version>1.5.1</version></Dependency>

3. Introduction to Features

Univocity-parsers offers a range of powerful features that are well-prepared to meet all of your processing needs for list-type data. The table shows some of the key features:



4. Reading list-style data

Read all rows in the CSV

New Csvparser (new  csvparsersettings ()); List<String[]> allrows = Parser.parseall (Getreader ("/examples/example.csv"));

To view all the functions associated with file writing, please visit: https://github.com/uniVocity/univocity-parsers#reading-csv

5. Write list-style data

With just 2 lines of code, you can write data in CSV format:

list<string[]> rows =newnew  csvwritersettings ()); Writer.writerowsandclose ( rows);

To view all the functions associated with file writing, please visit: https://github.com/uniVocity/univocity-parsers/blob/master/README.md#writing

6. Performance and Scalability

Here is a comparison table comparing the Univocity-parsers and javacsv:

file size javacsv parsing Time-consuming 10MB, line 145453 1138ms 100MB, line 809008 23s 6s
434MB, 4499959 rows 91s 1GB, 23803502 rows 70s

Here you can see the performance comparison tables of almost all CSV parsing libraries, which can be found in the table, univocity-parsers leading to other libraries with absolute advantage.

The benefits of univocity-parsers in terms of performance and flexibility are due to the following design and mechanism:

    • Read data as a separate thread (set by calling Csvparsersettings.setreadinputonseparatethread ())
    • Parallel row Data Processor (refer to Rowprocessor implementation class Concurrentrowprocessor)
    • Process column data according to business requirements by inheriting the Columnprocessor class
    • Process row data according to business requirements by inheriting the Rowprocessor class

7. Design and implementation

In Univocity-parsers, there are some core data-processing modules that are responsible for reading and writing data by line, reading and writing columns, and converting row and column data. Here is the diagram of these core modules:

You can develop your own data processing module by implementing the Rowprocessor interface or inheriting its implementation class. In the following code, I developed my own data processing module through a simple internal anonymous class.

Csvparsersettings settings =Newcsvparsersettings (); Settings.setrowprocessor (Newrowprocessor () {StringBuilder StringBuilder=NewStringBuilder ();/*** Before processing the first line of data, you can do the related initialization configuration according to the business logic. **/@Override Public voidprocessstarted (Parsingcontext context) {System.out.println ("Started to process rows of data.");}/*** Process line data according to your business logic **/@Override Public voidrowprocessed (string[] row, Parsingcontext context) {System.out.println ("The row in line #" + context.currentline () + ":"); for(String col:row) {stringbuilder.append (col). Append ("\ T");}}/*** After all line data processing is completed, do cleanup work. **/@Override Public voidprocessended (Parsingcontext context) {System.out.println ("Finished processing rows of data."); System.out.println (StringBuilder);}}); Csvparser Parser=Newcsvparser (settings); List<String[]> allrows = Parser.parseall (NewFileReader ("/myfile.csv"));

The Univocity-parsers Library offers more than that, as it plays a big role in our projects and recommends that you learn more.

Univocity-parsers: A powerful csv/tsv/fixed-width text file parsing library (Java)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.