Scrapy base ———— writes indefinite length URLs to the item in fixed length

Source: Internet
Author: User


As mentioned earlier, the URL of each article is written to item, but the length of each URL is different.
You can set a field in item how to make each URL the same length, which requires a MD5 of each URL
Operation, so that the length is uniform, and then added to the Set item field
Create a new folder from the root of the project and write all the custom methods that can be used, named Util
and create a new common.py file from the Util

Write the following:

 1  import   Hashlib  2   GET_MD5 (URL):  3  if   Isinstance (URL,STR):  4  url = url.encode ( " utf-8  "   5  m = Hashlib.md5 ()  6   M.update (URL)  7  return  m.hexdigest () 
explanation of conversion codesall characters in the Python3 are Unicode encoded, while MD5 is the encoding of UTF-8, which is not difficult to understand
Calculations are done in the CPU, and in memory it should be utf-8 encoded, in order to save memory, and in Python2, this is not the case, because all characters in Python3 are Unicode
Encoding, all python3 are not garbled.



Finally, the method is introduced from jobbole.py and written to the item field

 from Import get_md5artical_item["url_object_id"] = get_md5 (Response.url)


Now that all the item fields have been added, all that remains is to write to the database.




Scrapy base ———— writes indefinite length URLs to the item in fixed length

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.