Python Data analysis: Data loading, storage and file formats

Last Update:2018-02-16 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The data calculation functions of numpy and pandas are described earlier. But this data is constructed by ourselves manually. If data cannot be imported into Python automatically, then these calculations have little meaning. This chapter describes how data is loaded and stored.
First, read and write the data in the text format
Pandas provides some functions for reading tabular data as dataframe objects. As the following table:

The CSV file is the default , as a separator. You can read the file contents from the command line cat .

In [4]: Cat/home/zhf/1.csv

1,2,3,4

5,6,7,8

9,10,11,12

Similarly, we can also use the Pandas function to read.

In [6]: result=pd.read_csv ('/home/zhf/1.csv ')

In [7]: Result

OUT[7]:

1 2 3 4

0 5 6) 7 8

1 9 10) 11 12

However, if you want to add a column name, you need to set the header parameter If the file you are reading does not have a column name

In [All]: Result=pd.read_csv ('/home/zhf/1.csv ', Header=none)

in []: Result

OUT[12]:

0 1 2 3

0 1 2) 3 4

1 5 6) 7 8

2 9 10) 11 12

You can also make the name of the column yourself

in [+]: result=pd.read_csv ('/home/zhf/1.csv ', names=[' one ', ' ' One ', ' three ', ' four ')

...: )

in [+]: Result

OUT[15]:

One, three four

0 1 2) 3 4

1 5 6) 7 8

2 9 10) 11 12

in [+]: result[' one ']

OUT[16]:

0 1

1 5

2 9

You can also explicitly place the column at the location of an index by setting the index_col parameter.

in [+]: result=pd.read_csv ('/home/zhf/1.csv ', names=[' one ', ' ' One ', ' three ', ' four ')

...:, index_col= ' four ')

in [+]: Result

OUT[19]:

One and three

Four

4 1 2 3

8 5 6 7

12 9 10 11

If there is such data as the following,# This line is what we don't need. How do I omit it?

In []: Result=pd.read_csv ('/home/zhf/1.csv ')

in []: Result

OUT[23]:

1 2 3 4

0 # # # # # # # #

1 5 6) 7 8

2 9 10) 11 12

You can skip the data for a row by SkipRows.

In []: Result=pd.read_csv ('/home/zhf/1.csv ', skiprows=[1])

in [+]: Result

OUT[25]:

1 2 3 4

0 5 6) 7 8

1 9 10) 11 12

In the same way, it is also possible to determine whether the data is non-empty or needs to be populated.

In [ten]: Result

OUT[10]:

1 2 3 4

0 # # # NaN

1 5 6) 7 8.0

2 9 10) 11 12.0

In [All]: pd.isnull (Result)

OUT[11]:

1 2 3 4

0 false False True

1 false false false

2 false false False

the built-in functions of read_csv and read_table are as follows:

Read a text file by block

When working with very large files, or finding the set of parameters in a large file for subsequent processing, you only need to read a small part of the file or iterate over the file by block.

Reading a few lines requires setting the nrows parameter, where the nrows subscript is starting from 0. So nrows=2 represents the first 3 lines.

in [+]: result=pd.read_csv ('/home/zhf/1.csv ', nrows=2)

in [+]: Result

OUT[20]:

1 2 3 4

0 # # # NaN

1 5 6) 7 8.0

Write data to a file

The data can also be output as a delimiter-formatted text

To_csv writes data to a file, and if it does not, it is automatically created.

Data1=data.to_csv ('/home/zhf/3.csv ')

You can also create separators at the time of writing

Data1=data.to_csv ('/home/zhf/3.csv ', sep= ' | ')

JSON file

JSON files are one of the most widely used data transfer files on HTTP, and here's how to get a JSON object into DataFrame

in [+]: Import JSON

in [+]: result=json.loads ('/home/zhf/test.json ')

In [approx]: data=json.dumps (Result)

In [PNS]: Ret=dataframe (data[' one '],columns=[' name ', ' age ')

There are many other files in the format, such as Html,xml, database and so on. These files are read and written in the same way as in python . There is no longer a description here.

Python Data analysis: Data loading, storage and file formats

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Python Data analysis: Data loading, storage and file formats

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Python Data analysis: Data loading, storage and file formats

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support