This article points just to:
(a) Read the data function of the text file format: read_csv,read_table
1. Read the text file of different separators, with the parameter Sep
2. Read the text file without field name (header), with parameter names
3. Index a text file, using Index_col
4. Skip read text file, with SkipRows
5. When the data is too large, you need to block-read the text data by blocks using Chunksize.
(ii) write data into a text file Format function: To_csv
Examples are as follows:
(a) reading a data set in a text file format
The difference between 1.read_csv and read_table:
#read_csv默认读取用逗号分隔符的文件, you do not need to specify a delimiter with Sep
Import Pandas as PD
Pd.read_csv (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.csv ')
#read_csv如果读的是用非逗号分隔符的文件, you must use Sep to specify the separator, otherwise read out is the original file, the data is not split open import pandas as Pdpd.read_csv (' c:\\users\\ Xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt ')
#与上面的例子可以对比一下区别Import pandas as Pdpd.read_csv (' c:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\ Data.txt ', sep= ' | ')
#read_table读取文件时必须要用sep来指定分隔符, otherwise read out the data is the original file, not split open. Import Pandas as pdpd.read_table (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.csv ')
#read_table读取数据必须指定分隔符Import pandas as pdpd.read_table (' c:\\users\\xiaoxiaodexiao\\pythonlianxi\\ Test0424\\data.txt ', sep= ' | ')
2. When a text file is read without headers and names specifying the header, the default First behavior table header
#用header =none indicates that the dataset does not have a header, the header and index pd.read_table are populated with Arabic numerals by default (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\ Test0424\\data.txt ', sep= ' | ', Header=none)
#用names可以自定义表头pd.read_table (' c:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt ', sep= ' | ',
names=[' x1 ', ' x2 ', ' x3 ', ' x4 ', ' x5 '])
3. Specify the index with Arabic numerals by default; Specify a column as an index with Index_col
names=[' x1 ', ' x2 ', ' x3 ', ' x4 ', ' x0 ']pd.read_table (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt ', Sep= ' | ', names=names,index_col= ' x0 ')
4. The following example uses SkipRows to read other row data after skipping the row of hello, regardless of whether the first row is the header, the header is the beginning of the No. 0 row
You can compare the differences between the three examples to understand
Pd.read_csv (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt ')
names=[' x1 ', ' x2 ', ' x3 ', ' x4 ', ' x0 ']pd.read_csv (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt ', Names=names, skiprows=[0,3,6])
Pd.read_csv (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt ', skiprows=[0,3,6])
Pd.read_csv (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt ', Header=none, skiprows=[0,3,6])
5. Block read, a total of 8 rows of data in the data1.txt, according to each block of 3 lines, will read 3 times, the first 3 lines, the second 3 rows, the third 1 rows of data to read.
Note that this is different from the skip when it comes to chunking, the table header is not read as the first line, and can be understood by a comparison of two examples.
Chunker = pd.read_csv (' c:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt ', chunksize=3) for M in Chunker: print (len (m)) print M
Chunker = pd.read_csv (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data1.txt ', Header=None, chunksize= 3) for M in Chunker: print (len (m)) print M
(ii) Writing data to text format with To_csv
Taking Data.txt as an example, note that when writing a file, the index is also written to the
Data=pd.read_table (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt ', sep= ' | ') Print data
#可以用index =false The write of the Forbidden Index. data=pd.read_table (' c:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt ', sep= ' | ') Data.to_csv (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\outdata.txt ', sep= '! ', Index=False)
#可以用columns指定写入的列data=pd.read_table (' c:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\data.txt ', Sep = ' | ') Data.to_csv (' C:\\users\\xiaoxiaodexiao\\pythonlianxi\\test0424\\outdata2.txt ', sep= ', ', Index=False, columns =[' A ', ' C ', ' d '])
Python reads text file data