Python connects MySQL and uses the Fetchall () method to filter special characters

Source: Internet
Author: User
A simple example of how Python operates the database is really very simple compared to Java JDBC, which eliminates a lot of complicated repetitive work and only cares about data acquisition and manipulation.
preparatory work
The appropriate environment and modules are required:

    • Ubuntu 14.04 64bit
    • Python 2.7.6
    • MySQLdb

Note: Ubuntu comes with Python installed, but to use Python to connect to the database, you also need to install the MySQLdb module, and the installation method is simple:

sudo apt-get install MySQLdb

Then enter the python environment, import this package, if there is no error, the installation was successful:

Pythonpython 2.7.6 (Default, June, 17:58:13) [GCC 4.8.2] on Linux2type "help", "copyright", "credits" or "license" For more information.>>> import mysqldb>>>

Python's standard database interface for Python Db-api (including python operation MySQL). Most Python database interfaces adhere to this standard. Different database also need different amount of module, because I installed the machine is MySQL, so using the MySQLdb module, for different databases, only need to change the underlying implementation of the interface module, code does not need to change, this is the role of the module.
Python database operations
First, we need a test sheet.
To build a table statement:

CREATE DATABASE study;use study;drop TABLE IF EXISTS Python_demo; CREATE TABLE python_demo (id int not NULL auto_increment COMMENT ' primary key, Auto increment ', user_no int not null COMMENT ' user number ', user_name VARBINARY () Not null COMMENT ' user name ', password VARBINARY (not null COMMENT ' user password ', remark VARBINARY (255) NOT NULL COMME  NT ' user notes ', PRIMARY KEY (id,user_no)) ENGINE =innodb DEFAULT CHARSET = UTF8 COMMENT ' user Test table '; INSERT into Python_demo (User_no, User_name, password, remark) VALUES (1001, ' 301 ', ' admin ', ' I am Zhang San '); INSERT into Python_demo (user_no, user_name, password, Remark) VALUES (1002, ' 302 ', ' admin ', ' I am Zhang San '); INSERT into Python_demo (user_no, user_name, password, remark) VALUES (1003, ' Zhang 303 ', ' admin ', ' I am Zhang San '); INSERT into Python_demo (user_no, user_name, password, remark) VALUES (1004, ' 304 ', ' admin ', ' I am Zhang San '); insert into Python_demo (user_no, user_name, password, remark) VALUES (1005, ' 305 ', ' admin ', ' I am Zhang San '); INSERT into Python_demo (user_no, user_name, password, remark) VALUES (1006, ' 306 ', ' admin ', ' I am Zhang San '); INSERT into PYthon_demo (user_no, user_name, password, remark) VALUES (1007, ' 307 ', ' admin ', ' I am Zhang San '); INSERT into Python_demo (User_no, User_name, password, remark) VALUES (1008, ' 308 ', ' admin ', ' I am Zhang San ');

Python code

#--coding=utf8--import Configparserimport sysimport mysqldbdef init_db ():  try:    conn = MySQLdb.connect (host= Conf.get (' database ', ' host '),                user=conf.get (' database ', ' user '),                passwd=conf.get (' database ', ' passwd '),                db=conf.get (' database ', ' db '),                charset= ' UTF8 ')    return conn  except:    print "Error: Database connection error"    return nonedef SELECT_DEMO (conn, SQL):  try:    cursor = conn.cursor ()    cursor.execute (SQL)    Return Cursor.fetchall ()  except:    print "Error: Database connection error"    return nonedef Update_demo ():  passdef Delete_demo ():  passdef Insert_demo ():  passif __name__ = = ' __main__ ':  conf = Configparser.configparser ()  conf.read (' mysql.conf ')  conn = init_db ()  sql = "SELECT * from%s"% conf.get (' Database ', ' table ')  data = SELECT_DEMO (conn, SQL)  Pass

Fetchall () field special character filtering processing
Recently in the data warehouse to do the migration work, the Data warehouse data are used by the shell script was extracted, and then replaced by Python scripts.
But when data extraction is stored in Hadoop, there is a problem:
Because the database field is many, in advance also do not know what the database field will be stored, hive table is separated by \t\n, which leads to a problem, if the contents of the MySQL field itself contains \t\n, then there will be field dislocation, and the headache is that MySQL has more than 100 fields , and I don't know which field will be the problem.
The practice in the shell script is to replace the field with the MySQL Replace function on the field that needs to be extracted, for example, assuming that the field in MySQL is Column1 varchar (2000), it is likely that there will be a case of special characters. In the SQL statement of the query, add

Select Replace (replace (column1, ' \ R ', '), ' \ n ', '), ' \ t ', ')

This has been done before, but it is particularly long to write SQL, especially there are more than 100 fields, and do not know which has special characters, as long as all add.
So the field is not processed in Python, resulting in a bias in the Hive table field, so the fields queried from MySQL in Python need to be filtered for each field before writing to the file
For example, I'll take the MySQL test as an example, first build a test table

CREATE TABLE ' filter_fields ' (' field1 ' varchar () default NULL, ' field2 ' varchar () default NULL, ' field3 ' varchar (50) Default NULL, ' FIELD4 ' varchar ($) default null, ' FIELD5 ' varchar () default NULL, ' FIELD6 ' varchar (+) default null) ENG Ine=innodb DEFAULT Charset=utf8;

There are six fields, all of which are varchar types, and inserting new data can insert special characters inside. Simple Insert bar Data test look at:

Insert into Filter_fields (FIELD1,FIELD2,FIELD3,FIELD4,FIELD5,FIELD6) VALUES (' test01 ', ' test02 ', ' test03 ', ' test04 ', ' test05 ', ' test06 '); insert into Filter_fields (FIELD1,FIELD2,FIELD3,FIELD4,FIELD5,FIELD6) VALUES (' test11\ntest11 ', ' Test12\n\n ', ' test13 ', ' test14 ', ' test15 ', ' test16 '); insert into Filter_fields (FIELD1,FIELD2,FIELD3,FIELD4,FIELD5, FIELD6) VALUES (' test21\ttest21 ', ' test22\ttest22\ttest22 ', ' test23\t\t\t ', ' test4 ', ' test5 ', ' test6 '); INSERT into Filter_fields (FIELD1,FIELD2,FIELD3,FIELD4,FIELD5,FIELD6) VALUES (' test21\rest21 ', ' test22\r\rest22\r\rest22 ', ' Test23\r\r\r ', ' test4 ', ' test5 ', ' test6 ');

The special characters that are inserted in the data may be joined together and are not linked together.
Python Test code:

# coding=utf-8import Mysqldbimport sysdb_host = ' 127.0.0.1 ' # database Address db_port = 3306 # database Port db_user = ' root ' # MySQL with Username db_pwd = ' yourpassword ' # mysql user password, replace with your password db_name = ' Test ' # database name db_table = ' filter_fields ' # database table # filter \t\nde in SQL field results F Extract_data (table_name): Try:conn = MySQLdb.connect (host=db_host, port = db_port, User=db_user, pas SWD = db_pwd, db = db_name, CharSet = "UTF8") cursor = Conn.cursor () except Mysqldb.error, E:print ' database connection exception ' Sy S.exit (1) try:sql = ' select * from%s; '% (  TABLE_NAME) cursor.execute (SQL) rows = Cursor.fetchall () print ' = = Field unfiltered Query result = = = ' for row in Rows:print Row print ' = = = Field filtered result = = ' Rows_list = [] for row in rows:row_list = [] for column in Row:r Ow_list.append (Column.replace (' \ t ', '). replace (' \ n ', ') "replace (' \ R ', ') ') rows_list.append (row_list) Print ro WS_LIST[-1] # [-1] indicates the last element of the list return rows_list except Mysqldb.error, E:print ' Execute SQL statement failed ' CUrsor.close () Conn.close () sys.exit (1) if __name__ = = ' __main__ ': print ' begin: ' rows = Extract_data (db_table) pas S

Look at the results of the output:

field does not filter query results

(U ' test01 ', U ' test02 ', U ' test03 ', U ' test04 ', U ' test05 ', U ' test06 ') (U ' test11\ntest11 ', U ' test12\n\n ', U ' test13 ', U ' test14 ', U ' test15 ', U ' test16 ') (U ' test21\ttest21 ', U ' test22\ttest22\ttest22 ', U ' test23\t\t\t ', U ' test4 ', U ' test5 ', U ' test6 ') (U ' test21\rest21 ', U ' test22\r\rest22\r\rest22 ', U ' test23\r\r\r ', U ' test4 ', U ' test5 ', U ' test6 ')

Results after field filtering

[u ' test01 ', U ' test02 ', U ' test03 ', U ' test04 ', U ' test05 ', U ' test06 '] [u ' test11test11 ', U ' test12 ', U ' test13 ', U ' test14 ', U ' test15 ', U ' test16 '] [u ' test21test21 ', U ' test22test22test22 ', U ' test23 ', U ' test4 ', U ' test5 ', U ' test6 '] [u ' test21est21 ', U ' test22est22est22 ', U ' test23 ', U ' test4 ', U ' test5 ', U ' test6 ']

As you can see, tabs, line feeds, and carriage returns are filtered.
Suggestion: Finally said humorous digression, do not belittle \ r, carriage return. A lot of people think that the carriage return is a newline character, actually not, \ r denotes a carriage return, \ n denotes a new line. Before the code is actually filtered out of the \t\n, but the data extracted is not, and later saw the source after the original is not filtered \ r, This difference leads to a lot of data extraction is wrong.

  • Related Article

    Contact Us

    The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

    If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

    A Free Trial That Lets You Build Big!

    Start building with 50+ products and up to 12 months usage for Elastic Compute Service

    • Sales Support

      1 on 1 presale consultation

    • After-Sales Support

      24/7 Technical Support 6 Free Tickets per Quarter Faster Response

    • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.