Introduction to CSV files and examples of C ++ implementations, as well as descriptions of csv instances
Comma-Separated Values (Comma-Separated Values, CSV, also known as character-Separated Values, because the delimiter can also be a Comma ): its files store table data (numbers and text) in plain text format ). Plain text means that the file is a character sequence and does not contain data that must be interpreted as a binary number. A csv file consists of any number of records separated by a line break. Each record is composed of fields. The separators between fields are other characters or strings, the most common is comma or tab. Generally, all records have exactly the same field sequence.
The common standard of CSV file format does not exist, but there is a basic description in RFC 4180. The character encoding is not specified, but 7-bit ASCII is the most basic universal encoding.
CSV is a common and relatively simple file format that is widely used by users, businesses, and science. The most widely used is the transfer of table data between programs, which operate on incompatible formats (usually private and/or non-standard formats ). Because a large number of programs support some CSV variants, at least as an optional input/output format.
"CSV" is not a single, clearly defined format (although RFC 4180 has a commonly used definition ). Therefore, in practice, the term "CSV" generally refers to any file with the following features:
(1) plain text, using a character set, such as ASCII, Unicode, EBCDIC, or GB2312 (Simplified Chinese;
(2) composed of records (typically one record per line );
(3) Each record is separated into fields by separators (typical separators include commas, semicolons, or tabs; sometimes separators can include optional spaces );
(4) Each record has the same field sequence.
Many CSV variants exist under these general constraints, so CSV files are not completely interconnected. However, these variations are very small, and many applications allow users to preview files (this is feasible because it is plain text), and then specify separators, escape rules, and so on. If the variation of a specific CSV file is too large and beyond the support range of a specific recipient, it is often possible to manually check and edit the file or fix the problem through a simple program. Therefore, in practice, CSV files are very convenient.
The CSV format is best used to represent a set or sequence of records. Each record has the same field sequence. The CSV format is not limited to a specific character set. No matter whether Unicode or ASCII is used (although CSV files supported by specific programs may have their own limitations ). Even translating from one character set to another is not a problem with CSV files (unlike almost all private data formats ). However, CSV does not provide any way to indicate which character sets are used.
There is no "CSV standard" for the existence of a large volume of variations in the "CSV" format ". In common usage, almost any text data separated by delimiters can be collectively referred to as "CSV" files. Different CSV formats may not be compatible.
The above content is mainly from Wikipedia.
The following code is for reference:
If the csv file is complex, try an Open Source library on github: https://github.com/ben-strasser/fast-cpp-csv-parser
Parse_csv.hpp:
#ifndef FBC_CPPBASE_TEST_PARSE_CSV_HPP_
#define FBC_CPPBASE_TEST_PARSE_CSV_HPP_
// reference: https://stackoverflow.com/questions/1120140/how-can-i-read-and-parse-csv-files-in-c?page=1&tab=votes#tab-top
#include
#include
#include
#include
#include
#include
class CSVRow {
public:
std :: string const & operator [] (std :: size_t index) const {return m_data [index];}
std :: size_t size () const {return m_data.size ();}
void readNextRow (std :: istream & str)
{
std :: string line;
std :: getline (str, line);
std :: stringstream lineStream (line);
std :: string cell;
m_data.clear ();
while (std :: getline (lineStream, cell, ',')) {
m_data.push_back (cell);
}
// This checks for a trailing comma with no data after it.
if (! lineStream && cell.empty ()) {
// If there was a trailing comma then add an empty element.
m_data.push_back ("");
}
}
private:
std :: vector m_data;
};
std :: istream & operator >> (std :: istream & str, CSVRow & data)
{
data.readNextRow (str);
return str;
}
class CSVIterator {
public:
/ * typedef std :: input_iterator_tag iterator_category;
typedef CSVRow value_type;
typedef std :: size_t difference_type;
typedef CSVRow * pointer;
typedef CSVRow & reference; * /
CSVIterator (std :: istream & str): m_str (str.good ()? & Str: nullptr) {++ (* this);}
CSVIterator (): m_str (nullptr) {}
// Pre Increment
CSVIterator & operator ++ () {if (m_str) {if (! ((* M_str) >> m_row)) {m_str = nullptr;}} return * this;}
// Post increment
CSVIterator operator ++ (int) {CSVIterator tmp (* this); ++ (* this); return tmp;}
CSVRow const & operator * () const {return m_row;}
CSVRow const * operator-> () const {return & m_row;}
bool operator == (CSVIterator const & rhs) {return ((this == & rhs) || ((this-> m_str == nullptr) && (rhs.m_str == nullptr)));}
bool operator! = (CSVIterator const & rhs) {return! ((* this) == rhs);}
private:
std :: istream * m_str;
CSVRow m_row;
};
#endif // FBC_CPPBASE_TEST_PARSE_CSV_HPP_
test_parse_csv.cpp:
#include "test_parse_cvs.hpp"
#include
#include
#include
#include
#include "parse_csv.hpp"
namespace parse_cvs_ {
int test_parse_cvs_1 ()
{
std :: ifstream file ("E: /GitCode/Messy_Test/testdata/test_csv.csv");
std :: vector> data;
CSVIterator loop (file);
for (; loop! = CSVIterator (); ++ loop) {
CSVRow row = * loop;
std :: vector tmp (row.size ());
for (int i = 0; i <row.size (); ++ i) {
tmp [i] = row [i];
}
data.emplace_back (tmp);
}
for (int i = 0; i <data.size (); ++ i) {
for (int j = 0; j <data [i] .size (); ++ j) {
fprintf (stdout, "% s \ t", data [i] [j] .c_str ());
}
fprintf (stdout, "\ n");
}
return 0;
}
} // namespace parse_cvs_
The test data of test_csv.csv is as follows:
The results are as follows: