the first bullet of the little white Reptile crawled sister figure
Problems encountered: python2.x and python3.x are different.
1. Coding problem , error ASCII in front Plus
Import sys
Reload (SYS)
sys.setdefaultencoding (' Utf-8 ')
, there are Chinese characters in the path that need to convert the str type to Unicode type, using the. Decode (' Utf-8 ')
2. print problem , the python2.x version of print has no parentheses, and you want parentheses to be added to the From __future__ import print_function before all the import.
Note: Anaconda is not used as required because the installation can also be downloaded with PIP. the little white Reptile's second bullet, the sturdy little reptile. the third bullet of the little white reptile goes heavy.
Install MongoDB to C disk.
Run Discovery Agent does not work, save to MongoDB do not know how to view. Small white crawler Four-bomb crawler run (multi-process + multithreading)
Note : You need to run the second code block that is my own start queue.py, write the address into MongoDB, and then run multithreaded + multiple process code that is multithreading.py to not always show "queue no data."
In addition, in the Pop_title () function of mogoqueue.py, return record[' theme ' should be changed to ' record[' theme '. Decode (' Utf-8 ')] to make no error: Keyerror. Because of the python2.7 coding problem, the word "subject" of the record[' theme ' should be converted to the Utf-8 type. Small Baijing scrapy first article
problem encountered : Importerror:no module named items
Solution : Http://stackoverflow.com/questions/10570635/scrapy-importerror-no-module-named-items
The individual uses the From __future__ import Absolute_import this method.
Install a python-operated MySQL package mysql-connector-python-2.1.4
Change the statement of the SQL class in sql.py to
sql = ' INSERT into Dd_name (xs_name,xs_author,category,name_id) VALUES (% (xs_name) s,% (xs_author) s,% (category) s,% (name _ID) s) '
error : programmingerror:1045 (28000): Access denied for user ' ROOT ' @ ' localhost ' (using Password:yes)
Database operations:
Open cmd with admin ID
net start MySQL
E:
Cd E:\mysql5.7\bin
mysql-hlocalhost-uroot-p123456
show databases;
Use test
DROP TABLE IF EXISTS ' dd_name ';
CREATE TABLE ' dd_name ' (
' id ' int () NOT NULL auto_increment,
' xs_name ' varchar (255) DEFAULT null,
' Xs_ Author ' varchar (255) default NULL,
' category ' varchar (255) default NULL,
' name_id ' varchar (255) Default null,< C6/>primary KEY (' id ')
) Engine=innodb auto_increment=38 DEFAULT charset=utf8mb4;
Show tables;
Note semicolon ...
The second part of the SQL statement:
DROP TABLE IF EXISTS ' dd_chaptername ';
CREATE TABLE ' dd_chaptername ' (
' id ' int () not NULL auto_increment,
' xs_chaptername ' varchar (255) DEFAULT NULL,
' xs_content ' text, '
id_name ' int (one) default null,
' num_id ' int (one) default null,
' URL ' varchar (255) Default NULL,
PRIMARY KEY (' id ')
) engine=innodb auto_increment=2726 default charset=gb18030;< C18/>set Foreign_key_checks=1;
error : unicodeencodeerror: ' ASCII ' codec can ' t encode characters in position 1-2: Ordinal No in range (128)
Add in front
Import sys
Reload (SYS)
sys.setdefaultencoding (' Utf-8 ')
error : unicodedecodeerror: ' UTF8 ' codec can ' t decode bytes in position 0-1: invalid continuation byte, try a variety of methods without fruit, put all the STR (...) strings are Unicode (str (response.meta[' Chaptername ')). Replace (' \xa0 ', '), errors= ' ignore ')
This form has been replaced.
Such as:
Note: The original domain name 23wx.com changed to 23us.com small Baijing scrapy second article (login)
No VIP card number so you can't go in ... Small Baijing scrapy third article (based on Scrapy-redis distributed and Cookie Pool)
Note: Just found that there is also a scrapy distributed in the previous article did not look, but because of the project and the correlation is not very small so later see it. Thank you, Cia Qingcai, your blog has helped me too much.