標籤:
13.1. csv — CSV File Reading and Writing
The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. There is no “CSV standard”, so the format is operationally defined by the many applications which read and write it. The lack of a standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer.
所謂CSV(逗號分隔值)格式是試算表和資料庫中最常見的匯入和匯出格式。這裡沒有“CSV 標準”,所以格式是由許多讀寫它的應用操作上定義。這標準上的缺乏意味著細微的差別往往存在於不同的應用程式生產和消費資料。這些差異可以使多個來源的處理CSV檔案變得困難。同時,分隔字元和引用字元的變化,整體格式如此相似以至於程式員可以寫一個單獨的模組,此模組可以有效操縱這樣的資料,並且封裝讀寫資料的細節。
The csv module implements classes to read and write tabular data in CSVformat. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats.
csv模組實作類別讀寫CSV格式的表格式資料。它允許程式員如此說:“Excel優選這種格式寫入資料“,或”從Excel等檔案讀資料,”不知道Excel所用的CSV格式的精確細節,程式員也可以以其他應用程式理解來描述csv格式,或者定義自己專用的csv格式。
The csv module’s reader and writer objects read and write sequences. Programmers can also read and write data in dictionary form using the DictReader and DictWriter classes.
csv模組的讀寫器對象可以讀取和寫入序列。程式員也可以使用DictReader和DictWriter類讀取和寫入字典形式的資料。
Note:This version of the csv module doesn’t support Unicode input. Also, there are currently some issues regarding ASCII NUL characters. Accordingly, all input should be UTF-8 or printable ASCII to be safe; see the examples in section Examples.
注意:這個版本(2.7)的csv模組不支援Unicode輸入。此外,目前有一些有關於ASCIINull 字元的問題。因此,所有的輸入都應該是UTF-8或者列印安全的ASCII;這些可以在Example部分看執行個體。
13.1.1. Module Contents
The csv module defines the following functions:
csv.reader(csvfile, dialect=‘excel‘, **fmtparams)
Return a reader object which will iterate over lines in the given csvfile.csvfile can be any object which supports the iterator protocol and returns a string each time its next() method is called — file objects and list objects are both suitable. If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference. An optionaldialect parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the Dialect class or one of the strings returned by thelist_dialects() function. The other optional fmtparams keyword arguments can be given to override individual formatting parameters in the current dialect. For full details about the dialect and formatting parameters, see section Dialects and Formatting Parameters.
返回一個讀對象,它會遍曆給定的csvfile(可以是支援迭代器協議的任何對象,檔案和列表對象都是合適的),每次next()方法調用都會返回一個字串。如果csvfile是一個檔案對象,它必須在有差異的平台上以“b”模式開啟檔案。一個可選的dialect參數常用來定義一系列特定的csv dialect(不清楚翻譯為何為好)參數。它可以是dialect類中的子類或由list_dialects()函數返回的字串之一的一個執行個體。其他可選的關鍵字參數fmtparams可以在當前dialect覆蓋個別格式化參數時給出。有關dialect和格式化參數詳情,請參照Dialects and Formatting Parameters部分。
Each row read from the csv file is returned as a list of strings. No automatic data type conversion is performed.
A short usage example:
>>>
>>> import csv>>> with open(‘eggs.csv‘, ‘rb‘) as csvfile:... spamreader = csv.reader(csvfile, delimiter=‘ ‘, quotechar=‘|‘)... for row in spamreader:... print ‘, ‘.join(row)Spam, Spam, Spam, Spam, Spam, Baked BeansSpamLovely Spam, Wonderful Spam
Changed in version 2.5: The parser is now stricter with respect to multi-line quoted fields. Previously, if a line ended within a quoted field without a terminating newline character, a newline would be inserted into the returned field. This behavior caused problems when reading files which contained carriage return characters within fields. The behavior was changed to return the field without inserting newlines. As a consequence, if newlines embedded within fields are important, the input should be split into lines in a manner which preserves the newline characters.
在2.5版本發生變化:解譯器現在相對於多行引述領域更加嚴格。之前,如果一行結束但沒有終止分行符號,分行符號將插入到返回欄位。這種情況經常在讀取包含斷行符號符的檔案時發生錯誤。這種情況已被修改,返回欄位不會插入新行。因此,如果新行嵌入欄位很重要,輸入應分成在其中保留換行字元的方式。
csv.writer(csvfile, dialect=‘excel‘, **fmtparams)
Return a writer object responsible for converting the user’s data into delimited strings on the given file-like object. csvfile can be any object with awrite() method. If csvfile is a file object, it must be opened with the ‘b’ flag on platforms where that makes a difference. An optional dialectparameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of theDialect class or one of the strings returned by thelist_dialects() function. The other optional fmtparams keyword arguments can be given to override individual formatting parameters in the current dialect. For full details about the dialect and formatting parameters, see section Dialects and Formatting Parameters. To make it as easy as possible to interface with modules which implement the DB API, the value None is written as the empty string. While this isn’t a reversible transformation, it makes it easier to dump SQL NULL data values toCSV files without preprocessing the data returned from a cursor.fetch* call. All other non-string data are stringified with str() before being written.
未完待續。。。
python csv學習