Python3 Crawler (eight) txt, JSON, CSV for data storage

Source: Internet
Author: User

Infi-chu:

http://www.cnblogs.com/Infi-chu/

TXT text store

TXT text storage, convenient, simple, almost suitable for any platform. But it is not conducive to retrieval.

1. For example:

Use requests to get the Web page source code, and then use the Pyquery parse library to parse

Import requestsfrom pyquery Import pyquery as Pqurl = ' Https://www.zhihu.com/explore ' header = {    ' user-agent ': ' mozilla/5.0 (Windows NT 6.1; WOW64) applewebkit/537.36 (khtml, like Gecko) '}html = Requests.get (url,headers=header). Textdoc = PQ (html) items = Doc ('. Explore-tab. Feed-item '). Items () for item in items:    question = Item.find (' h2 '). Text ()    author = Item.find ('. Author-link-line '). Text ()    answer = PQ (Item.find ('. Content '). html ()). Text () with    open (' test.txt ', ' a ', encoding= ' uft-8 ') as F:        f.write (' \ n '. Join ([Author,question,answer]))        f.write (' \ n '. Join (' \ n ' + ' = ' *50+ ' \ n ') )

2. Open mode:

R

Rb

r+

rb+

W

Wb

w+

wb+

A

Ab

A +

ab+

Note

The difference between W and a is equivalent to the difference between > and >> in Linux

JSON file storage

JSON is all called (JavaScript object Notation), JavaScript objects tags, through the combination of objects and arrays to represent data, structure concise, very structured, similar to the dictionary in Python

1. Objects & Arrays:

In JavaScript, everything is the object.

Object:

Using {} In JavaScript can be understood as a dictionary in Python with key values.

Array:

Using [] in JavaScript can be understood as a list in Python.

2. Read JSON

We can call the loads () method of the Python JSON library to convert the JSON text into a JSON object and use the dumps () method to convert the JSON object to a text string.

Use the Get () method to get the value after the name of the key, and if there is no value, return none, typically customizing a default value in case none is returned

In the JSON string, double quotation marks are used, otherwise an exception is thrown.

3. Output JSON

Call the Dumps () method to convert the JSON object to a text string.

Import Jsondata = [{' Name ': ' Infi-chu ', ' sex ': ' Male ', ' birthday ': ' 2000.01.01 '}]with open (' Data.json ', ' w+ ') as F:    F.wirte (Json.dumps (data)) "wants to save the JSON format, can add a parameter, Indentwith open (' Data.json ', ' W ') as F:    F.write (Json.dumps ( data,indent=2)) "

 

CSV file storage

The CSV full name is (comma-separated values), a comma-separated value or a character-delimited value, which stores tabular data in plain text, equivalent to structured plain text.

It's more concise than Excel, and the XLS text is a spreadsheet that contains text, values, formulas, and formatting, which are not included in the CSV.

1. For example

Import Csvwith Open (' data.csv ', ' w+ ') as CF:    wirter = csv.writer (cf) ' Wirter = Csv.writer (cf,delimiter= ')    # This parameter is incremented by the delimiter "'    wirter.wirterow ([' id ', ' name ', ' age '])    # writerows is written to multiple lines at the same time, Writerow is a single-line write    Wirter.wirterow ( [' 1 ', ' infi ', +])    Wirter.wirterow ([' 2 ', ' Chu ', 23])
# Dictionary Way of writing send import Csvwith open (' Data.csv ', ' W ') as f:    fieldname = [' id ', ' name ', ' age ']    wirter = csv. Dictwriter (F,filednames=filename)    # Dictwriter () method Initializes a dictionary write object    wirter.writheader ()    # Writheader () Method Write header information    wirter.wirterows ({' id ': ' 1 ', ' name ': ' N1 ', ' age ': ' 1},{' id ': ' 2 ', ' name ': ' N2 ', ' age ': ' 2},{' id ': ' 3 ', ' Name ': ' N3 ', ' Age ': 3})

2. Read

# read CSV file via CSV Library import Csvwith open (' Data.csv ', ' R ', encoding= ' utf-8 ') as f:    reader = Csv.reader (f)    # Reader () Method facilitates the content of each line for    I in render:        print (ROW) # Read through Pandas import pandas as Pddf = Pd.read_csv (' data.csv ') print (DF)

Python3 Crawler (eight) txt, JSON, CSV for data storage

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.