Python data visualization programming practice-import data, python practice
1. import data from a csv file
Principle: The with statement opens the file and binds it to object f. You don't have to worry about shutting down data files after operating the resources. The with context manager will help you. Then, the csv. reader () method returns the reader object, which traverses all rows of the read file.
1 #!/usr/bin/env python
2
3 import csv
4
5 filename = 'ch02-data.csv'
6
7 data = []
8 try:
9 with open(filename) as f:
10 reader = csv.reader(f)
11 c = 0
12 for row in reader:
13 if c == 0:
14 header = row
15 else:
16 data.append(row)
17 c += 1
18 except csv.Error as e:
19 print "Error reading CSV file at line %s: %s" % (reader.line_num, e)
20 sys.exit(-1)
21
22 if header:
23 print header
24 print '=================='
25
26 for datarow in data:
27 print datarow
Experiment results:
2. Import file data from Excel
An Excel file can be converted to a csv file and then imported using the preceding method. However, if you want to automate data pipeline processing for a large number of files (as part of the continuous data processing process ), then, it is impossible to manually convert each Excel file into a CSV file.
Principle: Use the xlrd module to open the file workbook, read the content of cells based on the number of rows (nrows) and number of columns (ncols), and return an xlrd. book instance by calling the open_workbook () method.
1 import xlrd
2 from xlrd.xldate import XLDateAmbiguous
3
4 file = 'ch02-xlsxdata.xlsx'
5
6 wb = xlrd.open_workbook(filename=file)
7
8 ws = wb.sheet_by_name('Sheet1')
9
10 dataset = []
11
12 for r in range(ws.nrows):
13 col = []
14 for c in range(ws.ncols):
15 col.append(ws.cell(r, c).value)
16 if ws.cell_type(r, c) == xlrd.XL_CELL_DATE:
17 try:
18 print ws.cell_type(r, c)
19 from datetime import datetime
20 date_value = xlrd.xldate_as_tuple(ws.cell(r, c).value, wb.datemode)
21 print datetime(*date_value)
22 except XLDateAmbiguous as e:
23 print e
24 dataset.append(col)
25
26 from pprint import pprint
27
28 pprint(dataset)
Experiment results:
3. import data from a fixed-width Data File
Time log files and time series-based files are the most common data sources in data visualization. Sometimes, you can read the CSV dialect that separates data by tabs, but sometimes they are not separated by any special characters. In fact, the fields of these files are fixed in width. We can match and extract data through the format.
For example (the data in this example is generated using code ):
Solution:
1. Specify the data file to be read. 2. Define the Data Reading method. 3. Read files row by row and parse each row into a separate data field based on the format. 4. Separate data fields to print each row.
1 import struct
2 import string
3
4 mask='9s14s5s'
5 parse = struct.Struct(mask).unpack_from
6 print 'formatstring {!r}, record size: {}'.format(\
7 mask, struct.calcsize(mask))
8
9 datafile = 'ch02-fixed-width-1M.data'
10
11 with open(datafile, 'r') as f:
12 for line in f:
13 fields = parse(line)
14 print 'fields: ', [field.strip() for field in fields]
Experiment results:
4. import data from the JSON Data Source
The procedure is as follows: 1. Specify the GitHub URL to read data in JSON format. 2. Use the requests module to access the specified URL and read the content. 3. Read the content and convert it to an object in JSON format. 4. iteratively access the JSON object. For each item, read the URL value of each code base.
Principle: first, use the requests module to obtain remote resources. The Requests module provides simple APIs to define HTTP predicates. You only need to call the get () method. We are only interested in the Response. json () method. This method can read the content of Response. content, parse it into JSON, and load it into a JSON object.
The Code is as follows:
1 import requests
2 from pprint import pprint
3 url = 'https://api.github.com/users/justglowing'
4 r = requests.get(url)
5 json_obj = r.json()
6 pprint(json_obj)
Result:
Conclusion: last month, I helped people design graduation projects, use FLASK, and then use java ee to write a mall website this month. I am so busy that I have never updated my blog, today, I saw python data visualization in the library on Sunday. I think it's better to get a blog. Unfortunately, it's a good series to be done .......