使用pandas讀取csv檔案

來源:互聯網
上載者:User
下面為大家分享一篇使用pandas讀取csv檔案的指定列方法,具有很好的參考價值,希望對大家有所協助。一起過來看看吧

根據教程實現了讀取csv檔案前面的幾行資料,一下就想到了是不是可以實現前面幾列的資料。經過多番嘗試總算試出來了一種方法。

之所以想實現讀取前面的幾列是因為我手頭的一個csv檔案恰好有後面幾列沒有可用資料,但是卻一直存在著。原來的資料如下:

GreydeMac-mini:chapter06 greyzhang$ cat data.csv

1,name_01,coment_01,,,,2,name_02,coment_02,,,,3,name_03,coment_03,,,,4,name_04,coment_04,,,,5,name_05,coment_05,,,,6,name_06,coment_06,,,,7,name_07,coment_07,,,,8,name_08,coment_08,,,,9,name_09,coment_09,,,,10,name_10,coment_10,,,,11,name_11,coment_11,,,,12,name_12,coment_12,,,,13,name_13,coment_13,,,,14,name_14,coment_14,,,,15,name_15,coment_15,,,,16,name_16,coment_16,,,,17,name_17,coment_17,,,,18,name_18,coment_18,,,,19,name_19,coment_19,,,,20,name_20,coment_20,,,,21,name_21,coment_21,,,,

如果使用pandas讀取出全部的資料,列印的時候會出現以下結果:

In [41]: data = pd.read_csv('data.csv')

In [42]: dataOut[42]:   1 name_01 coment_01 Unnamed: 3 Unnamed: 4 Unnamed: 5 Unnamed: 60 2 name_02 coment_02   NaN   NaN   NaN   NaN1 3 name_03 coment_03   NaN   NaN   NaN   NaN2 4 name_04 coment_04   NaN   NaN   NaN   NaN3 5 name_05 coment_05   NaN   NaN   NaN   NaN4 6 name_06 coment_06   NaN   NaN   NaN   NaN5 7 name_07 coment_07   NaN   NaN   NaN   NaN6 8 name_08 coment_08   NaN   NaN   NaN   NaN7 9 name_09 coment_09   NaN   NaN   NaN   NaN8 10 name_10 coment_10   NaN   NaN   NaN   NaN9 11 name_11 coment_11   NaN   NaN   NaN   NaN10 12 name_12 coment_12   NaN   NaN   NaN   NaN11 13 name_13 coment_13   NaN   NaN   NaN   NaN12 14 name_14 coment_14   NaN   NaN   NaN   NaN13 15 name_15 coment_15   NaN   NaN   NaN   NaN14 16 name_16 coment_16   NaN   NaN   NaN   NaN15 17 name_17 coment_17   NaN   NaN   NaN   NaN16 18 name_18 coment_18   NaN   NaN   NaN   NaN17 19 name_19 coment_19   NaN   NaN   NaN   NaN18 20 name_20 coment_20   NaN   NaN   NaN   NaN19 21 name_21 coment_21   NaN   NaN   NaN   NaN

所說在學習的過程中這並不會給我帶來什麼障礙,但是在命令列終端介面呆久了總喜歡稍微清爽一點的風格。使用read_csv的參數usecols能夠在一定程度上減少這種混亂感。

In [45]: data = pd.read_csv('data.csv',usecols=[0,1,2,3])

In [46]: dataOut[46]:   1 name_01 coment_01 Unnamed: 30 2 name_02 coment_02   NaN1 3 name_03 coment_03   NaN2 4 name_04 coment_04   NaN3 5 name_05 coment_05   NaN4 6 name_06 coment_06   NaN5 7 name_07 coment_07   NaN6 8 name_08 coment_08   NaN7 9 name_09 coment_09   NaN8 10 name_10 coment_10   NaN9 11 name_11 coment_11   NaN10 12 name_12 coment_12   NaN11 13 name_13 coment_13   NaN12 14 name_14 coment_14   NaN13 15 name_15 coment_15   NaN14 16 name_16 coment_16   NaN15 17 name_17 coment_17   NaN16 18 name_18 coment_18   NaN17 19 name_19 coment_19   NaN18 20 name_20 coment_20   NaN19 21 name_21 coment_21   NaN

為了能夠看到資料的“邊界”,讀取的時候顯示了第一列無效的資料。正常的使用中,或許我們是想連上面結果中最後一列的資訊也去掉的,那隻需要在參數重去掉最後一列的列號。

In [47]: data = pd.read_csv('data.csv',usecols=[0,1,2])

In [48]: dataOut[48]:   1 name_01 coment_010 2 name_02 coment_021 3 name_03 coment_032 4 name_04 coment_043 5 name_05 coment_054 6 name_06 coment_065 7 name_07 coment_076 8 name_08 coment_087 9 name_09 coment_098 10 name_10 coment_109 11 name_11 coment_1110 12 name_12 coment_1211 13 name_13 coment_1312 14 name_14 coment_1413 15 name_15 coment_1514 16 name_16 coment_1615 17 name_17 coment_1716 18 name_18 coment_1817 19 name_19 coment_1918 20 name_20 coment_2019 21 name_21 coment_21

聯繫我們

該頁面正文內容均來源於網絡整理,並不代表阿里雲官方的觀點,該頁面所提到的產品和服務也與阿里云無關,如果該頁面內容對您造成了困擾,歡迎寫郵件給我們,收到郵件我們將在5個工作日內處理。

如果您發現本社區中有涉嫌抄襲的內容,歡迎發送郵件至: info-contact@alibabacloud.com 進行舉報並提供相關證據,工作人員會在 5 個工作天內聯絡您,一經查實,本站將立刻刪除涉嫌侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.