Use request to download novels and request to download Novels

Source: Internet
Author: User

Use request to download novels and request to download Novels

 

Use requests

Use requestsTable of Contents
  • 1. Use rquests
    • 1.1. Official Quick Guide document (the latest version. Pay attention to the version number when using a specific version)
    • 1.2. Open a website
    • 1.3. Send other requests
      • 1.3.1. Use requests. post (url) to send a post request)
      • 1.3.2. Similarly, this method is used to send requests.
      • 1.3.3. Contain cookies to get cookies in the browser and pass them to get (url, cookies = cookies)
    • 1.4. Determine whether the file is opened correctly and the file encoding
      • 1.4.1. requests. codes. OK is successful.
    • 1.5. Get the returned content
      • 1.5.1. Get Text
      • 1.5.2. Obtain binary value
      • 1.5.3. Response json
      • 1.5.4. original response content
    • 1.6. Write to file
    • 1.7. parse the content BeautifulSoup
    • 1.8. Use select to find the specified object
      • 1.8.1. Use select to obtain the specified tag. The search syntax is similar to the css selector.
      • 1.8.2. Get the Tag Name and content
  • 2. Instance
    • 2.1. Pass some key-value pairs to use Baidu search
    • 2.2. Send cookies to Weibo
    • 2.3. Download the novel <Tao medical world>
1. Use the official Quick Guide document rquests 1.1 (the latest version, pay attention to the version number when using a specific version)

Http://docs.python-requests.org/zh_CN/latest/user/quickstart.html#id2

1.2 open a website

Res = requests. get (url)

1.3 send other requests 1.3.1 use requests. post (url) 1.3.2 For post requests. This method is used to send requests.

For example: requests. put () requests. delete () requests. put () requests. options ()

1.3.3 contain cookies to get cookies in the browser and pass them to get (url, cookies = cookies)

To obtain cookies in chrome, open the three dots in the upper right corner of the page to be obtained. More tools-> Headers in name in developer tool NetWork contains cookies.

For example, the cookies for Weibo login are stored in namefavicon. ico.

1.4 check whether the file is opened correctly and check the file encoding
Res. encoding # Get the webpage code. When this attribute is assigned a value, call res. text again and use the new encoding res. status_code # Get the returned http status code.
1.4.1 requests. codes. OK succeeded
Res. rails_for_status () # If the returned value is not 200, an exception is thrown.

Use try again t to wrap the package to get a clearer error message

try:    res.rails_for_status()except Exception as exp:    print(exp)
1.5 get the returned content 1.5.1 get the text

Res. text type (res. text) # => <class 'str'> you can know that the returned value is of the str type. Slice and in can be easily used.

1.5.2 obtain binary value

You can also use res. text, but the official example of using content is provided.

>>> from PIL import Image>>> from io import BytesIO>>> i = Image.open(BytesIO(r.content))
1.5.3 response json

R. json () throws an exception when parsing fails.

1.5.4 original response content

R = requests. get ('url', stream = True) set stream = True to get the original socket content

1.6 write to file

You can directly use the with open method to write data to a file, but when the file is too large, it will occupy a large amount of memory, so part of the write

for part_file in res.iter_content(size):    file.write(part_file)

Sets the size (kb). The specified size is written each time.

1.7 parse content BeautifulSoup

Official documentation: https://www.crummy.com/software/BeautifulSoup/bs4/doc/index.zh.html can use BeautifulSoup to download the pip3 install bs4 package using import bs4 to get some definite content

Be = bs4.BeautifulSoup (htmlfile, "html. parser ")

1.8 use select to find the specified object 1.8.1 use select to obtain the specified tag. The syntax is similar to the css selector.

Select ('div ') returns the content of all div labels' # id' id is id '. class 'all classes are class de' div. class ''a [href] 'of the class enclosed in all divs all <a href = '???? '> <A> can have other attributes 'a [href = 'https: // www.baidu.com'] 'all <a href = 'https: // www.baidu.com'>, <a> you can have other attributes 'div> a' <div> <a> no other tags in the middle.

1.8.2 obtain the Tag Name and content
Select returns the list of Tag objects eles = select ('. menu ') type (eles) # => class 'LIST' # print all matching parts (eles) # output a matching print (eles [0]) # output tag value # For example, <a> Baidu </a> outputs Baidu print (eles [0]. getText () # print (eles [0]. attrs) # Only the current tag is printed without outputting the sub-TAG content.
2. instance 2.1 transmits key-value pairs to use Baidu search

Take Baidu as an example to use Baidu search, need to build a url https://www.baidu.com/s in this way? Wd = % E5 % 85% B3 % E9 % 94% AE % E5 % AD % 97 then we need to use get ('url', params = parameterdict) for this result. For example:

Search = input ("Enter the content you want baidu") baidu = 'https: // www.baidu.com/s? 'Search _ params = {'wd ': search} try: baidu_re = requests. get (baidu, params = search_params) failed t Exception as err: print (err) finally: pass # specifies the file write encoding to prevent webpage encoding from being UTF-8 and compatible with winwith open('baidusearch.html ', 'w', encoding = baidu_re.encoding) as html: html. write (baidu_re.text)
2.2 send cookies to Weibo

Many functions of Weibo can be used only after logon. Therefore, when obtaining information, you need to use the cookies used to log on to Weibo, log on to Weibo, and obtain cookies.

Import requestscookies = {'cookies': 'value'} # note that the Cookie: SINAGLOBAL .... all copied requests. get ('other Weibo pages', cookies = cookies) # Start to parse the desired content.
2.3 download the novel <Dao weixia>
"Get all Chapter" "import requests, bs4, chardetfrom io import StringIOi = 1 strio = StringIO () next_href = 'HTTP: // www.ppxs.net/63/63820/19862177.html'text = "" # obtain the title and body and store it in the StringIO stream def getUrlText (url): global I, next_href, print page ("from this page ({}) start reading chapter ". format (url) page = requests. get (url) # Set the character encoding to the webpage encoding; otherwise, page_text = page will be garbled. text. encode (page. encoding) be = bs4.BeautifulSoup (page_text, "html. parser ") I + = 1 next_page = be. select (". bottem a ") next_href = next_page [3]. attrs ['href '] print ("next chapter % s" % next_href) title = be. select (". bookname h1 ") title_text = title [0]. text txt = be. select ("# booktext") per_txt = txt [0]. text # Start of each chapter. Delete in_text = per_txt.lstrip (''' <div class = "content" id = "booktext"> <! -- Go --> <p> <font color = "# FF0000" face = "" size = "3"> welcome to Renren novels. Remember the address: http://www.ppxs.net, mobile phone to read m.ppxs.net, so that you can read the novel "Tao medical world" the latest chapter at any time... </font> </p> '''). replace ("<br>", "") inner_text = title_text + "\ n" + in_text strio. write (inner_text) # getUrlText ('HTTP: // www.ppxs.net/63/63820/19862177.html') try: while next_href! = True: getUrlText (next_href) # next_href = getUrl (next_href) print (I) failed t Exception as e: print (e) finally: with open ("Dao medical world .txt ", 'A + ') as a:. write (strio. getvalue ())

Author: vz li Branch

Created: Tue

Emacs 25.1.1 (Org mode 8.2.10)

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.