Python connects to MySQL and uses the fetchall () method to filter out special characters, pythonfetchall
Let's take a simple example to see how Python operates the database. Compared with Java JDBC, it is indeed very simple, saving a lot of complicated repetitive work and only concerned with data acquisition and operations.
Preparations
The following environments and modules are required:
- Ubuntu 14.04 64bit
- Python 2.7.6
- MySQLdb
Note: Python is installed in Ubuntu, but to connect to the database using Python, you also need to install the MySQLdb module. The installation method is also very simple:
sudo apt-get install MySQLdb
Then go to the Python environment and import this package. If no error is reported, the installation is successful:
pythonPython 2.7.6 (default, Jun 22 2015, 17:58:13) [GCC 4.8.2] on linux2Type "help", "copyright", "credits" or "license" for more information.>>> import MySQLdb>>>
Python standard database interface Python DB-API (including Python for MySQL operations ). Most Python database interfaces adhere to this standard. Different databases need different modules. because MySQL is installed on my local machine, I use the MySQLdb module. For different databases, I only need to change the module that implements the interface at the underlying layer, the Code does not need to be changed. This is the role of the module.
Python database operations
First, we need a test table.
Table creation statement:
Create database study; use study; drop table if exists python_demo; create table python_demo (id int not null AUTO_INCREMENT COMMENT 'Primary key, auto-incrementing ', user_no int not null comment 'user number ', user_name VARBINARY (50) not null comment 'username', password VARBINARY (50) not null comment 'user password', remark VARBINARY (255) not null comment 'user note ', primary key (id, user_no) ENGINE = innodb default charset = utf8 COMMENT 'user test table'; insert into python_demo (user_no, user_name, password, remark) VALUES (1001, 'zhang San 01', 'admin', 'I am Zhang San'); INSERT INTO python_demo (user_no, user_name, password, remark) VALUES (1002, 'zhang San 02 ', 'admin', 'zhang san'); insert into python_demo (user_no, user_name, password, remark) VALUES (1003, 'zhang San 03', 'admin ', 'I am Zhang san'); insert into python_demo (user_no, user_name, password, remark) VALUES (1004, 'zhang San 04', 'admin', 'I am Zhang san '); insert into python_demo (user_no, user_name, password, remark) VALUES (1005, 'zhang San 05 ', 'admin',' I am Zhang San '); INSERT INTO python_demo (user_no, user_name, password, remark) VALUES (1006, 'zhang San 06', 'admin', 'I am Zhang san'); INSERT INTO python_demo (user_no, user_name, password, remark) VALUES (1007, 'zhang San 07 ', 'admin',' I am Zhang San '); INSERT INTO python_demo (user_no, user_name, password, remark) VALUES (1008, 'zhang San 08 ', 'admin', 'I am James ');
Python code
# -- Coding = utf8 -- import ConfigParserimport sysimport MySQLdbdef init_db (): try: conn = MySQLdb. connect (host = conf. get ('database', 'host'), user = conf. get ('database', 'user'), passwd = conf. get ('database', 'passwd'), db = conf. get ('database', 'db'), charset = 'utf8') return conn failed T: print "Error: Database connection Error" return Nonedef select_demo (conn, SQL): try: cursor = conn. cursor () cursor.exe cute (SQL) return cursor. fetchall () failed T: print "Error: database connection Error" return Nonedef update_demo (): passdef delete_demo (): passdef insert_demo (): passif _ name _ = '_ main _': conf = ConfigParser. configParser () conf. read ('mysql. conf ') conn = init_db () SQL = "select * from % s" % conf. get ('database', 'table') data = select_demo (conn, SQL) pass
Special Character Filtering for the fetchall () Field
Recently, I was working on Data Warehouse migration. Previously, the data in the data warehouse was extracted using the shell script, and later I changed the python script.
However, when we extract and store data to hadoop, a problem occurs:
Because there are many database fields and you do not know what the database fields will store in advance, the hive table creation is separated by \ t \ n, which leads to a problem, if the content of the mysql field contains \ t \ n, the field may be misplaced, And the headache is that mysql has more than 100 fields, I do not know which field will cause this problem.
In shell scripts, replace the fields with the replace function of mysql on the fields to be extracted. For example, assume that the field in mysql is column1 varchar (2000 ), it is very likely that there will be special characters, which will be added to the queried SQL statement.
select replace(replace(replace(column1,'\r',''),'\n',''),'\t','')
I have been doing this for a long time, but it is very long to write SQL statements, especially when there are more than 100 fields, and I don't know which special characters are included.
Therefore, fields are not processed in python, leading to deviations in hive table fields. Therefore, each field in python needs to be filtered before the fields queried from mysql are written to a file.
Let's look at an example. I will take the mysql test as an example. First, I will create a test table.
CREATE TABLE `filter_fields` ( `field1` varchar(50) DEFAULT NULL, `field2` varchar(50) DEFAULT NULL, `field3` varchar(50) DEFAULT NULL, `field4` varchar(50) DEFAULT NULL, `field5` varchar(50) DEFAULT NULL, `field6` varchar(50) DEFAULT NULL) ENGINE=InnoDB DEFAULT CHARSET=utf8;
There are six fields, all of which are varchar type. You can insert special characters into new data. Just insert a piece of data for testing:
insert into filter_fields(field1,field2,field3,field4,field5,field6) VALUES('test01','test02','test03','test04','test05','test06');insert into filter_fields(field1,field2,field3,field4,field5,field6) VALUES('test11\ntest11','test12\n\n','test13','test14','test15','test16');insert into filter_fields(field1,field2,field3,field4,field5,field6) VALUES('test21\ttest21','test22\ttest22\ttest22','test23\t\t\t','test4','test5','test6');insert into filter_fields(field1,field2,field3,field4,field5,field6) VALUES('test21\rest21','test22\r\rest22\r\rest22','test23\r\r\r','test4','test5','test6');
The special characters inserted in the data may or may not be connected together.
Python test code:
# Coding = utf-8import MySQLdbimport sysdb_host = '2017. 0.0.1 '# Database address db_port = 3306 # Database port db_user = 'root' # mysql user name db_pwd = 'yourpassword' # mysql user password, change to your password db_name = 'test' # Database Name db_table = 'filter _ fields' # Database Table # filter \ t \ ndef extract_data (table_name) in the SQL field Result: try: conn = MySQLdb. connect (host = db_host, port = db_port, user = db_user, passwd = db_pwd, db = db_name, charset = "utf8") cursor = conn. cursor () Counter t MySQLdb. error, e: print 'database connection exception' sys. exit (1) try: SQL = 'select * from % s; '% (table_name) cursor.exe cute (SQL) rows = cursor. fetchall () print '==== field unfiltered query result ====' for row in rows: print row print '= Result = 'rows_list = [] for row in rows: row_list = [] for column in row: row_list.append (column. replace ('\ t ',''). replace ('\ n ',''). replace ('\ R', '') rows_list.append (row_list) print rows_list [-1] # [-1] indicates the return rows_list comment t MySQLdb. error, e: print 'SQL statement execution failed' cursor. close () conn. close () sys. exit (1) if _ name _ = '_ main _': print 'in in: 'rows = extract_data (db_table) pass
Check the output result:
The field does not filter the query results.
(u'test01', u'test02', u'test03', u'test04', u'test05', u'test06')(u'test11\ntest11', u'test12\n\n', u'test13', u'test14', u'test15', u'test16')(u'test21\ttest21', u'test22\ttest22\ttest22', u'test23\t\t\t', u'test4', u'test5', u'test6')(u'test21\rest21', u'test22\r\rest22\r\rest22', u'test23\r\r\r', u'test4', u'test5', u'test6')
Result After field filtering
[u'test01', u'test02', u'test03', u'test04', u'test05', u'test06'][u'test11test11', u'test12', u'test13', u'test14', u'test15', u'test16'][u'test21test21', u'test22test22test22', u'test23', u'test4', u'test5', u'test6'][u'test21est21', u'test22est22est22', u'test23', u'test4', u'test5', u'test6']
As you can see, tabs, line breaks, and returns are filtered out.
Suggestion: Do not underestimate the \ r and carriage return characters when making a digress. many people think that a carriage return is a line break. \ r indicates a carriage return and \ n indicates a new line. in the previous Code, \ t \ n was actually filtered out, but the extracted data is still incorrect. After reading the source code, we found that \ r was not filtered, this difference leads to a lot of incorrect data extraction.
Articles you may be interested in:
- How to access the mysql database using python (Example 2)
- How to compile a Python script for MySQL rollback
- Connect python Django to the MySQL database for addition, deletion, modification, and query
- Python Mysql instance code tutorial online (query manual)
- Example of MySQLdb module usage in python
- Install Mysql module MySQLdb in Python
- Connect python mysqldb to the database
- Python backup Mysql script
- MySQL-Python Installation notes
- Example of connecting python to mysql database (add, delete, and modify operations)