The information you get from the site is saved locally
Database, but in the process of saving the database information has become garbled, how to solve it? Sir listen to me. First, ensure that the following four items are encoded in Utf-8:1. Code 2. Database Connection 3. Table's character set format 4. The data format is inserted in each step as follows: 1. Ensure that the format of the code is UTF-8, add this sentence in front of the code #-*-Coding:utf8-*-#首先用于确定编码, plus this sentence 2. Ensure that the database connection format is utf-8, so write conn=mysqldb.connect (host= ' localhost ', user= ' root ', passwd= ' * * * *, db= ' kfxx ', Port=3306,charset = ' UTF8 ') cur=conn.cursor () 3. The character set format of the guaranteed table is Utf-8, which can be set 4 when the table is built. Ensure that the inserted data format is utf-8, divided into guaranteed read page format is utf-8 and string format is also utf-8 #解决乱码问题
| Html_1 = Urllib2.urlopen (cityurl,timeout=120). Read () MyChar = Chardet.detect (html_1) Bianma = mychar[' encoding ']if Bianma = = ' Utf-8 ' or Bianma = = ' UTF-8 ': html = html_1else:html = Html_1.decode (' gb2312 ', ' ignore '). Encode (' Utf-8 ') |
| Chapter_soup = BeautifulSoup (html) city = chapter_soup.find (' div ', Class_ = ' row-fluid '). Find (' H1 '). Get_text () Province = Chapter_soup.find (' a ', Class_ = ' province '). Get_text () Pmnum = Chapter_soup.find (' div ', Class_ = ' row-fluid '). Find (' Span '). Get_text () suggest = Chapter_soup.find (' div ', Class_ = ' row-fluid '). Find (' H2 '). Get_text () Rand = Chapter_ Soup.find (' div ', Class_ = ' row-fluid '). Find (' H2 '). Find_next_sibling (' H2 '). Get_text () face = Chapter_soup.find (' div ', Class_ = ' Span4 Pmemoji '). Find (' H1 '). Get_text () conclusion = Chapter_soup.find (' h1 ', Class_ = ' review '). Get_text () Print City.encode (' Utf-8 ') cur.execute (' INSERT into t_pm values (\ ' +city.encode (' utf-8 ') + ' \ ', \ ' +province.encode (' utf-8 ') + ' \ ', \ ' +pmnum.encode (' utf-8 ') + ' \ ', \ ' +suggest.encode (' utf-8 ') + ' \ ', \ ' +rand.encode (' utf-8 ') + ' \ ', \ ' + Conclusion.encode (' utf-8 ') + ') ') |
Finished, the inserted data are in Chinese, see:
Troubleshoot garbled problems after database insertion "go"