Read_csv function of pandas

Source: Internet
Author: User
pd.read_csv(filepath_or_buffer, sep=‘,‘, delimiter=None, header=‘infer‘, names=None, index_col=None, usecols=None, squeeze=False, prefix=None, mangle_dupe_cols=True, dtype=None, engine=None, converters=None, true_values=None, false_values=None, skipinitialspace=False, skiprows=None, nrows=None, na_values=None, keep_default_na=True, na_filter=True, verbose=False, skip_blank_lines=True, parse_dates=False, infer_datetime_format=False, keep_date_col=False, date_parser=None, dayfirst=False, iterator=False, chunksize=None, compression=‘infer‘, thousands=None, decimal=b‘.‘, lineterminator=None, quotechar=‘"‘, quoting=0, escapechar=None, comment=None, encoding=None, dialect=None, tupleize_cols=False, error_bad_lines=True, warn_bad_lines=True, skipfooter=0, skip_footer=0, doublequote=True, delim_whitespace=False, as_recarray=False, compact_ints=False, use_unsigned=False, low_memory=True, buffer_lines=None, memory_map=False, float_precision=None)

Filepath_or_buffer:

File address, which can be a URL.

SEP:

Delimiter.

Delimiter:

STR, the delimiter. If this parameter is specified, the SEP parameter is invalid.

Delim_whitespace: Boolean,

Default false. Specify whether spaces (such as ''or'') are used as separators. This is equivalent to setting Sep = '\ s + '.

Header:

Int or list of ints, default 'infer ', specifies the number of rows used as the column name. If there is no column name in the file, the default value is 0; otherwise, the value is set to none.

Names:

Array-like, default none is used for the column name list of the results. Rename each column to add a header. If the data has a header, but you want to use a new header, you can set header = 0, names = ['A', 'B'] to customize the header.

Index_col:

Int or sequence or false. Default none is used as the column number or column name of the row index. If a sequence is specified, multiple row indexes exist. You can use index_col = [1st] to specify the and 2 columns in the file as index columns.

Usecols:

Array-like, default none returns a subset of data, that is, a few columns are selected, and the content of the entire file is not read, which helps speed up and reduce memory. Usecols = [1, 2] Or usercols = ['A', 'B']

Squeeze:

Boolean, default false if the file contains only one column, a series is returned.

Prefix:

STR, default none Add a prefix to the column if no column title exists. For example, add 'x' to x0, X1 ,...

Mangle_dupe_cols:

Boolean, default true repeated columns, 'x'... 'X' is expressed as 'x. 0'... 'x. n '. If this parameter is set to false, all duplicate columns are overwritten.

Dtype:

Type name or dict of column-> type, default none indicates the Data Type of each column. For example, {'A': NP. float64, 'B': NP. int32 }.

Engine:

The analysis engine used by {'C', 'python'} and optional. You can select C or Python. The C engine is fast, but the python engine has more functions.

Converters:

Dict, default none column Conversion Function dictionary. The key can be the column name or column sequence number.

Date type parameters:

Parse_dates:

Boolean or list of ints or names or list of lists or dict, default false.

Boolean. True-> resolved index list of ints or names. e.g. If [1, 2, 3]-> resolved values of column 1, 2, and 3 as independent date columns;

List of lists. e.g. If [[1, 3]-> merge column 1 and column 3 as a date column

Dict, e.g. {'foo': [1, 3]}-> merge column 1 and 3 and name the merged column "foo ".

Example:

DF = Pd. read_csv (file_path, parse_dates = ['time1', 'time2']), parses the time1 and time2 columns into the date format.

I have to say that it is a pity that Chinese characters cannot be used. For example, the format 'August 1' cannot be parsed.

Infer_datetime_format:

Boolean, default false if it is set to true and parse_dates is available, pandas will try to convert it to the date type. If it can be converted, the conversion method will be parsed. In some cases, it may be faster than 5 ~ 10 times.

Keep_date_col:

Boolean, default false if multiple columns are connected to parse the date, the columns involved in the join are retained. The default value is false.

Date_parser:

Function, default none is the function for parsing the date. dateutil. parser. parser is used by default for conversion.

Pandas tries to use three different methods for parsing. If there is a problem, use the following method.

1. Use one or more arrays (specified by parse_dates) as parameters;

2. concatenate multiple string columns as one column as the parameter;

3. Each line calls the date_parser function once to parse one or more strings (specified by parse_dates) as parameters.

Dayfirst:

Boolean, default false DD/MM format date type.

Reprinted: 78471036

Read_csv function of pandas

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.