Python data visualization programming practice-import data, python practice

Source: Internet
Author: User
Tags pprint

Python data visualization programming practice-import data, python practice

 1. import data from a csv file

Principle: The with statement opens the file and binds it to object f. You don't have to worry about shutting down data files after operating the resources. The with context manager will help you. Then, the csv. reader () method returns the reader object, which traverses all rows of the read file.

 

 1 #!/usr/bin/env python
 2 
 3 import csv
 4 
 5 filename = 'ch02-data.csv'
 6 
 7 data = []
 8 try:
 9     with open(filename) as f:
10         reader = csv.reader(f)
11         c = 0
12         for row in reader:
13             if c == 0:
14                 header = row
15             else:
16                 data.append(row)
17             c += 1
18 except csv.Error as e:
19     print "Error reading CSV file at line %s: %s" % (reader.line_num, e)
20     sys.exit(-1)
21 
22 if header:
23     print header
24     print '=================='
25 
26 for datarow in data:
27     print datarow

 

 

 

Experiment results:


2. Import file data from Excel

An Excel file can be converted to a csv file and then imported using the preceding method. However, if you want to automate data pipeline processing for a large number of files (as part of the continuous data processing process ), then, it is impossible to manually convert each Excel file into a CSV file.

Principle: Use the xlrd module to open the file workbook, read the content of cells based on the number of rows (nrows) and number of columns (ncols), and return an xlrd. book instance by calling the open_workbook () method.

 

 1 import xlrd
 2 from xlrd.xldate import XLDateAmbiguous
 3 
 4 file = 'ch02-xlsxdata.xlsx'
 5 
 6 wb = xlrd.open_workbook(filename=file)
 7 
 8 ws = wb.sheet_by_name('Sheet1')
 9 
10 dataset = []
11 
12 for r in range(ws.nrows):
13     col = []
14     for c in range(ws.ncols):
15         col.append(ws.cell(r, c).value)
16         if ws.cell_type(r, c) == xlrd.XL_CELL_DATE:
17             try:
18                 print ws.cell_type(r, c)
19                 from datetime import datetime
20                 date_value = xlrd.xldate_as_tuple(ws.cell(r, c).value, wb.datemode)
21                 print datetime(*date_value)
22             except XLDateAmbiguous as e:
23                 print e
24     dataset.append(col)
25 
26 from pprint import pprint
27 
28 pprint(dataset)

 

 

 

 

Experiment results:


3. import data from a fixed-width Data File

Time log files and time series-based files are the most common data sources in data visualization. Sometimes, you can read the CSV dialect that separates data by tabs, but sometimes they are not separated by any special characters. In fact, the fields of these files are fixed in width. We can match and extract data through the format.

For example (the data in this example is generated using code ):


Solution:

1. Specify the data file to be read. 2. Define the Data Reading method. 3. Read files row by row and parse each row into a separate data field based on the format. 4. Separate data fields to print each row.


 1 import struct
 2 import string
 3 
 4 mask='9s14s5s'
 5 parse = struct.Struct(mask).unpack_from
 6 print 'formatstring {!r}, record size: {}'.format(\
 7                         mask, struct.calcsize(mask))
 8 
 9 datafile = 'ch02-fixed-width-1M.data'
10 
11 with open(datafile, 'r') as f:
12     for line in f:
13         fields = parse(line)
14         print 'fields: ', [field.strip() for field in fields]

 

Experiment results:


4. import data from the JSON Data Source

The procedure is as follows: 1. Specify the GitHub URL to read data in JSON format. 2. Use the requests module to access the specified URL and read the content. 3. Read the content and convert it to an object in JSON format. 4. iteratively access the JSON object. For each item, read the URL value of each code base.

Principle: first, use the requests module to obtain remote resources. The Requests module provides simple APIs to define HTTP predicates. You only need to call the get () method. We are only interested in the Response. json () method. This method can read the content of Response. content, parse it into JSON, and load it into a JSON object.

The Code is as follows:


1 import requests
2 from pprint import pprint
3 url = 'https://api.github.com/users/justglowing'
4 r = requests.get(url)
5 json_obj = r.json()
6 pprint(json_obj)

 

Result:


 

 

Conclusion: last month, I helped people design graduation projects, use FLASK, and then use java ee to write a mall website this month. I am so busy that I have never updated my blog, today, I saw python data visualization in the library on Sunday. I think it's better to get a blog. Unfortunately, it's a good series to be done .......


Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.