Below for you to share an article using pandas read CSV file specified column method, has a good reference value, I hope to be helpful to everyone. Come and see it together.
According to the tutorial implementation of reading the CSV file in front of the first few lines of data, you can think of is not possible to implement the previous columns of data. After a lot of attempts to finally try out a method.
The reason I want to read the previous columns is because I have a CSV file on hand that has no data available in the next few columns, but it always exists. The original data is as follows:
Greydemac-mini:chapter06 greyzhang$ Cat Data.csv
1,name_01,coment_01,,,, 2,name_02,coment_02,,,, 3,name_03,coment_03,,,, 4,name_04,coment_04,,,, 5,name_05,coment_ ,,,, 6,name_06,coment_06,,,, 7,name_07,coment_07,,,, 8,name_08,coment_08,,,, 9,name_09,coment_09,,,, 10,name_10, Coment_10,,,, 11,name_11,coment_11,,,, 12,name_12,coment_12,,,, 13,name_13,coment_13,,,, 14,name_14,coment_14,,,, 15,name_15,coment_15,,,, 16,name_16,coment_16,,,, 17,name_17,coment_17,,,, 18,name_18,coment_18,,,, 19,name_19, coment_19,,,, 20,name_20,coment_20,,,, 21,name_21,coment_21,,,,
If you use pandas to read all the data, the following results will appear when you print:
in [+]: data = pd.read_csv (' data.csv ')
In []: dataout[42]: 1 name_01 coment_01 unnamed:3 unnamed:4 unnamed:5 unnamed:60 2 name_02 coment_02 nan nan Nan NaN1 3 name_03 coment_03 nan nan nan NaN2 4 name_04 coment_04 nan nan nan NaN3 5 name_05 coment_05 Nan nan nan NaN4 6 name_06 coment_06 nan nan nan NaN5 7 name_07 coment_07 nan nan nan NaN6 8 name_ Coment_08 nan nan nan NaN7 9 name_09 coment_09 nan nan nan NaN8 ten name_10 coment_10 nan nan nan NaN9 name_11 coment_11 nan nan nan NaN10 name_12 coment_12 nan nan nan NaN11-name_13 coment_13 Nan nan nan NaN12 name_14 coment_14 nan nan nan NaN13 name_15 coment_15 nan nan nan NaN14 16 Name_16 coment_16 nan nan nan NaN15 name_17 coment_17 nan nan nan NaN16 name_18 coment_18 nan N An nan NaN17 name_19 coment_19 nan nan, nan NaN18 name_20 coment_20 nan nan nan NaN19 (name_21) Coment_21 nan NanNan Nan
It doesn't give me any obstacles in the course of learning, but I always prefer a slightly fresher style in the command-line terminal interface. Using the Read_csv parameter usecols can reduce this confusion to some extent.
In []: data = pd.read_csv (' Data.csv ', usecols=[0,1,2,3])
In []: dataout[46]: 1 name_01 coment_01 unnamed:30 2 name_02 coment_02 NaN1 3 name_03 coment_03 NaN2 4 Name _04 coment_04 NaN3 5 name_05 coment_05 NaN4 6 name_06 coment_06 NaN5 7 name_07 coment_07 NaN6 8 name_08 Co ment_08 NaN7 9 name_09 coment_09 NaN8 ten name_10 coment_10 NaN9 one name_11 coment_11 NaN10 one Name_12 C Oment_12 NaN11 name_13 coment_13 NaN12 name_14 coment_14 NaN13-name_15 coment_15 NaN14- na Me_16 coment_16 NaN15 name_17 coment_17 NaN16-name_18 coment_18 NaN17- name_19 coment_19 NaN18 name_20 coment_20 NaN19 name_21 coment_21 NaN
In order to be able to see the "boundary" of the data, the first column of invalid data is displayed when reading. In normal use, perhaps we want to remove the information from the last column of the above results, which only requires the column number of the last column to be removed in the parameter.
in [+]: data = pd.read_csv (' Data.csv ', usecols=[0,1,2])
In []: dataout[48]: 1 name_01 coment_010 2 name_02 coment_021 3 name_03 coment_032 4 name_04 coment_043 5 name_05 Co ment_054 6 name_06 coment_065 7 name_07 coment_076 8 name_08 coment_087 9 name_09 coment_098 ten name_10 coment_109 one name _11 coment_1110, Name_12 coment_1211, name_13 coment_1312, name_14 coment_1413, name_15 coment_1514, name_16 com ent_1615 name_17 coment_1716 name_18 coment_1817, name_19 coment_1918 (name_20 coment_2019) name_21 coment_21