Python Data analysis: Data loading, storage and file formats

Source: Internet
Author: User

The data calculation functions of numpy and pandas are described earlier. But this data is constructed by ourselves manually. If data cannot be imported into Python automatically, then these calculations have little meaning. This chapter describes how data is loaded and stored.
First, read and write the data in the text format
Pandas provides some functions for reading tabular data as dataframe objects. As the following table:

The CSV file is the default , as a separator. You can read the file contents from the command line cat .

In [4]: Cat/home/zhf/1.csv

1,2,3,4

5,6,7,8

9,10,11,12

Similarly, we can also use the Pandas function to read.

In [6]: result=pd.read_csv ('/home/zhf/1.csv ')

In [7]: Result

OUT[7]:

1 2 3 4

0 5 6) 7 8

1 9 10) 11 12

However, if you want to add a column name, you need to set the header parameter If the file you are reading does not have a column name

In [All]: Result=pd.read_csv ('/home/zhf/1.csv ', Header=none)

in []: Result

OUT[12]:

0 1 2 3

0 1 2) 3 4

1 5 6) 7 8

2 9 10) 11 12

You can also make the name of the column yourself

in [+]: result=pd.read_csv ('/home/zhf/1.csv ', names=[' one ', ' ' One ', ' three ', ' four ')

...: )

in [+]: Result

OUT[15]:

One, three four

0 1 2) 3 4

1 5 6) 7 8

2 9 10) 11 12

in [+]: result[' one ']

OUT[16]:

0 1

1 5

2 9

You can also explicitly place the column at the location of an index by setting the index_col parameter.

in [+]: result=pd.read_csv ('/home/zhf/1.csv ', names=[' one ', ' ' One ', ' three ', ' four ')

...:, index_col= ' four ')

in [+]: Result

OUT[19]:

One and three

Four

4 1 2 3

8 5 6 7

12 9 10 11

If there is such data as the following,# This line is what we don't need. How do I omit it?

In []: Result=pd.read_csv ('/home/zhf/1.csv ')

in []: Result

OUT[23]:

1 2 3 4

0 # # # # # # # #

1 5 6) 7 8

2 9 10) 11 12

You can skip the data for a row by SkipRows.

In []: Result=pd.read_csv ('/home/zhf/1.csv ', skiprows=[1])

in [+]: Result

OUT[25]:

1 2 3 4

0 5 6) 7 8

1 9 10) 11 12

In the same way, it is also possible to determine whether the data is non-empty or needs to be populated.

In [ten]: Result

OUT[10]:

1 2 3 4

0 # # # NaN

1 5 6) 7 8.0

2 9 10) 11 12.0

In [All]: pd.isnull (Result)

OUT[11]:

1 2 3 4

0 false False True

1 false false false

2 false false False

the built-in functions of read_csv and read_table are as follows:

Read a text file by block

When working with very large files, or finding the set of parameters in a large file for subsequent processing, you only need to read a small part of the file or iterate over the file by block.

Reading a few lines requires setting the nrows parameter, where the nrows subscript is starting from 0. So nrows=2 represents the first 3 lines.

in [+]: result=pd.read_csv ('/home/zhf/1.csv ', nrows=2)

in [+]: Result

OUT[20]:

1 2 3 4

0 # # # NaN

1 5 6) 7 8.0

Write data to a file

The data can also be output as a delimiter-formatted text

To_csv writes data to a file, and if it does not, it is automatically created.

Data1=data.to_csv ('/home/zhf/3.csv ')

You can also create separators at the time of writing

Data1=data.to_csv ('/home/zhf/3.csv ', sep= ' | ')

JSON file

JSON files are one of the most widely used data transfer files on HTTP, and here's how to get a JSON object into DataFrame

in [+]: Import JSON

in [+]: result=json.loads ('/home/zhf/test.json ')

In [approx]: data=json.dumps (Result)

In [PNS]: Ret=dataframe (data[' one '],columns=[' name ', ' age ')

There are many other files in the format, such as Html,xml, database and so on. These files are read and written in the same way as in python . There is no longer a description here.

Python Data analysis: Data loading, storage and file formats

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.