The data calculation functions of numpy and pandas are described earlier. But this data is constructed by ourselves manually. If data cannot be imported into Python automatically, then these calculations have little meaning. This chapter describes how data is loaded and stored.
First, read and write the data in the text format
Pandas provides some functions for reading tabular data as dataframe objects. As the following table:
The CSV file is the default , as a separator. You can read the file contents from the command line cat .
In [4]: Cat/home/zhf/1.csv
1,2,3,4
5,6,7,8
9,10,11,12
Similarly, we can also use the Pandas function to read.
In [6]: result=pd.read_csv ('/home/zhf/1.csv ')
In [7]: Result
OUT[7]:
1 2 3 4
0 5 6) 7 8
1 9 10) 11 12
However, if you want to add a column name, you need to set the header parameter If the file you are reading does not have a column name
In [All]: Result=pd.read_csv ('/home/zhf/1.csv ', Header=none)
in []: Result
OUT[12]:
0 1 2 3
0 1 2) 3 4
1 5 6) 7 8
2 9 10) 11 12
You can also make the name of the column yourself
in [+]: result=pd.read_csv ('/home/zhf/1.csv ', names=[' one ', ' ' One ', ' three ', ' four ')
...: )
in [+]: Result
OUT[15]:
One, three four
0 1 2) 3 4
1 5 6) 7 8
2 9 10) 11 12
in [+]: result[' one ']
OUT[16]:
0 1
1 5
2 9
You can also explicitly place the column at the location of an index by setting the index_col parameter.
in [+]: result=pd.read_csv ('/home/zhf/1.csv ', names=[' one ', ' ' One ', ' three ', ' four ')
...:, index_col= ' four ')
in [+]: Result
OUT[19]:
One and three
Four
4 1 2 3
8 5 6 7
12 9 10 11
If there is such data as the following,# This line is what we don't need. How do I omit it?
In []: Result=pd.read_csv ('/home/zhf/1.csv ')
in []: Result
OUT[23]:
1 2 3 4
0 # # # # # # # #
1 5 6) 7 8
2 9 10) 11 12
You can skip the data for a row by SkipRows.
In []: Result=pd.read_csv ('/home/zhf/1.csv ', skiprows=[1])
in [+]: Result
OUT[25]:
1 2 3 4
0 5 6) 7 8
1 9 10) 11 12
In the same way, it is also possible to determine whether the data is non-empty or needs to be populated.
In [ten]: Result
OUT[10]:
1 2 3 4
0 # # # NaN
1 5 6) 7 8.0
2 9 10) 11 12.0
In [All]: pd.isnull (Result)
OUT[11]:
1 2 3 4
0 false False True
1 false false false
2 false false False
the built-in functions of read_csv and read_table are as follows:
Read a text file by block
When working with very large files, or finding the set of parameters in a large file for subsequent processing, you only need to read a small part of the file or iterate over the file by block.
Reading a few lines requires setting the nrows parameter, where the nrows subscript is starting from 0. So nrows=2 represents the first 3 lines.
in [+]: result=pd.read_csv ('/home/zhf/1.csv ', nrows=2)
in [+]: Result
OUT[20]:
1 2 3 4
0 # # # NaN
1 5 6) 7 8.0
Write data to a file
The data can also be output as a delimiter-formatted text
To_csv writes data to a file, and if it does not, it is automatically created.
Data1=data.to_csv ('/home/zhf/3.csv ')
You can also create separators at the time of writing
Data1=data.to_csv ('/home/zhf/3.csv ', sep= ' | ')
JSON file
JSON files are one of the most widely used data transfer files on HTTP, and here's how to get a JSON object into DataFrame
in [+]: Import JSON
in [+]: result=json.loads ('/home/zhf/test.json ')
In [approx]: data=json.dumps (Result)
In [PNS]: Ret=dataframe (data[' one '],columns=[' name ', ' age ')
There are many other files in the format, such as Html,xml, database and so on. These files are read and written in the same way as in python . There is no longer a description here.
Python Data analysis: Data loading, storage and file formats