Add Chinese website crawl function to nutch .
1, Chinese Web page crawl
A, adjust the MySQL configuration , to avoid storing MySQL in the Chinese garbled. Modify ${apache_nutch_home}/runtime/local/conf/gora.properties
###############################
# MySQL Properties #
###############################
Gora.sqlstore.jdbc.driver=com.mysql.jdbc.driver
Gora.sqlstore.jdbc.url=jdbc:mysql://10.10.11.252:3306/nutch? Useunicode=true&characterencoding=utf8&autoreconnect=true&zerodatetimebehavior=converttonull
Gora.sqlstore.jdbc.user=devuser
Gora.sqlstore.jdbc.password=devuser
B, modify ${apache_nutch_home}/runtime/local/conf/nutch-site.xml file
<property>
<name>http.accept.language</name>
<VALUE>JA-JP, en-US, zh-cn,en-gb,en;q=0.7,*;q=0.3</value>
<description>value of the "Accept-language" Request header field.
This allows selecting Non-english language as the default one to retrieve.
It is a useful setting for search engines build for certain national group.
</description>
</