Search for small white reptiles and Advanced Series Learning notes

Source: Internet
Author: User
Tags install mongodb mongodb
the first bullet of the little white Reptile crawled sister figure

Problems encountered: python2.x and python3.x are different.
1. Coding problem , error ASCII in front Plus

Import sys
Reload (SYS)
sys.setdefaultencoding (' Utf-8 ')

, there are Chinese characters in the path that need to convert the str type to Unicode type, using the. Decode (' Utf-8 ')
2. print problem , the python2.x version of print has no parentheses, and you want parentheses to be added to the From __future__ import print_function before all the import.

Note: Anaconda is not used as required because the installation can also be downloaded with PIP. the little white Reptile's second bullet, the sturdy little reptile. the third bullet of the little white reptile goes heavy.

Install MongoDB to C disk.
Run Discovery Agent does not work, save to MongoDB do not know how to view. Small white crawler Four-bomb crawler run (multi-process + multithreading)

Note : You need to run the second code block that is my own start queue.py, write the address into MongoDB, and then run multithreaded + multiple process code that is multithreading.py to not always show "queue no data."
In addition, in the Pop_title () function of mogoqueue.py, return record[' theme ' should be changed to ' record[' theme '. Decode (' Utf-8 ')] to make no error: Keyerror. Because of the python2.7 coding problem, the word "subject" of the record[' theme ' should be converted to the Utf-8 type. Small Baijing scrapy first article

problem encountered : Importerror:no module named items
Solution : Http://stackoverflow.com/questions/10570635/scrapy-importerror-no-module-named-items
The individual uses the From __future__ import Absolute_import this method.
Install a python-operated MySQL package mysql-connector-python-2.1.4
Change the statement of the SQL class in sql.py to

sql = ' INSERT into Dd_name (xs_name,xs_author,category,name_id) VALUES (% (xs_name) s,% (xs_author) s,% (category) s,% (name _ID) s) '

error : programmingerror:1045 (28000): Access denied for user ' ROOT ' @ ' localhost ' (using Password:yes)
Database operations:
Open cmd with admin ID
net start MySQL
E:
Cd E:\mysql5.7\bin
mysql-hlocalhost-uroot-p123456
show databases;
Use test

DROP TABLE IF EXISTS ' dd_name ';
CREATE TABLE ' dd_name ' (
  ' id ' int () NOT NULL auto_increment,
  ' xs_name ' varchar (255) DEFAULT null,
  ' Xs_ Author ' varchar (255) default NULL,
  ' category ' varchar (255) default NULL,
  ' name_id ' varchar (255) Default null,< C6/>primary KEY (' id ')
) Engine=innodb auto_increment=38 DEFAULT charset=utf8mb4;

Show tables;
Note semicolon ...

The second part of the SQL statement:

DROP TABLE IF EXISTS ' dd_chaptername ';
CREATE TABLE ' dd_chaptername ' (
  ' id ' int () not NULL auto_increment,
  ' xs_chaptername ' varchar (255) DEFAULT NULL,
  ' xs_content ' text, '
  id_name ' int (one) default null,
  ' num_id ' int (one) default null,
  ' URL ' varchar (255) Default NULL,
  PRIMARY KEY (' id ')
) engine=innodb auto_increment=2726 default charset=gb18030;< C18/>set Foreign_key_checks=1;

error : unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 1-2: Ordinal No in range (128)
Add in front

Import sys
Reload (SYS)
sys.setdefaultencoding (' Utf-8 ')

error : unicodedecodeerror: ' UTF8 ' codec can ' t decode bytes in position 0-1: invalid continuation byte, try a variety of methods without fruit, put all the STR (...) strings are Unicode (str (response.meta[' Chaptername ')). Replace (' \xa0 ', '), errors= ' ignore ')
This form has been replaced.
Such as:
Note: The original domain name 23wx.com changed to 23us.com small Baijing scrapy second article (login)

No VIP card number so you can't go in ... Small Baijing scrapy third article (based on Scrapy-redis distributed and Cookie Pool)

Note: Just found that there is also a scrapy distributed in the previous article did not look, but because of the project and the correlation is not very small so later see it. Thank you, Cia Qingcai, your blog has helped me too much.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.