Infi-chu:
http://www.cnblogs.com/Infi-chu/
TXT text store
TXT text storage, convenient, simple, almost suitable for any platform. But it is not conducive to retrieval.
1. For example:
Use requests to get the Web page source code, and then use the Pyquery parse library to parse
Import requestsfrom pyquery Import pyquery as Pqurl = ' Https://www.zhihu.com/explore ' header = { ' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) '}html = Requests.get (url,headers=header). Textdoc = PQ (html) items = Doc ('. Explore-tab. Feed-item '). Items () for item in items: question = Item.find (' h2 '). Text () author = Item.find ('. Author-link-line '). Text () answer = PQ (Item.find ('. Content '). html ()). Text () with open (' test.txt ', ' a ', encoding= ' uft-8 ') as F: f.write (' \ n '. Join ([Author,question,answer])) f.write (' \ n '. Join (' \ n ' + ' = ' *50+ ' \ n ') )
2. Open mode:
R
Rb
r+
rb+
W
Wb
w+
wb+
A
Ab
A +
ab+
Note
The difference between W and a is equivalent to the difference between > and >> in Linux
JSON file storage
JSON is all called (JavaScript object Notation), JavaScript objects tags, through the combination of objects and arrays to represent data, structure concise, very structured, similar to the dictionary in Python
1. Objects & Arrays:
In JavaScript, everything is the object.
Object:
Using {} In JavaScript can be understood as a dictionary in Python with key values.
Array:
Using [] in JavaScript can be understood as a list in Python.
2. Read JSON
We can call the loads () method of the Python JSON library to convert the JSON text into a JSON object and use the dumps () method to convert the JSON object to a text string.
Use the Get () method to get the value after the name of the key, and if there is no value, return none, typically customizing a default value in case none is returned
In the JSON string, double quotation marks are used, otherwise an exception is thrown.
3. Output JSON
Call the Dumps () method to convert the JSON object to a text string.
Import Jsondata = [{' Name ': ' Infi-chu ', ' sex ': ' Male ', ' birthday ': ' 2000.01.01 '}]with open (' Data.json ', ' w+ ') as F: F.wirte (Json.dumps (data)) "wants to save the JSON format, can add a parameter, Indentwith open (' Data.json ', ' W ') as F: F.write (Json.dumps ( data,indent=2)) "
CSV file storage
The CSV full name is (comma-separated values), a comma-separated value or a character-delimited value, which stores tabular data in plain text, equivalent to structured plain text.
It's more concise than Excel, and the XLS text is a spreadsheet that contains text, values, formulas, and formatting, which are not included in the CSV.
1. For example
Import Csvwith Open (' data.csv ', ' w+ ') as CF: wirter = csv.writer (cf) ' Wirter = Csv.writer (cf,delimiter= ') # This parameter is incremented by the delimiter "' wirter.wirterow ([' id ', ' name ', ' age ']) # writerows is written to multiple lines at the same time, Writerow is a single-line write Wirter.wirterow ( [' 1 ', ' infi ', +]) Wirter.wirterow ([' 2 ', ' Chu ', 23])
# Dictionary Way of writing send import Csvwith open (' Data.csv ', ' W ') as f: fieldname = [' id ', ' name ', ' age '] wirter = csv. Dictwriter (F,filednames=filename) # Dictwriter () method Initializes a dictionary write object wirter.writheader () # Writheader () Method Write header information wirter.wirterows ({' id ': ' 1 ', ' name ': ' N1 ', ' age ': ' 1},{' id ': ' 2 ', ' name ': ' N2 ', ' age ': ' 2},{' id ': ' 3 ', ' Name ': ' N3 ', ' Age ': 3})
2. Read
# read CSV file via CSV Library import Csvwith open (' Data.csv ', ' R ', encoding= ' utf-8 ') as f: reader = Csv.reader (f) # Reader () Method facilitates the content of each line for I in render: print (ROW) # Read through Pandas import pandas as Pddf = Pd.read_csv (' data.csv ') print (DF)
Python3 Crawler (eight) txt, JSON, CSV for data storage