We are doing data analysis, the process of cleaning, many times will face a variety of data sources, to the different data sources for cleaning, warehousing work. Of course, Python this language, I prefer, the development of high efficiency, basically how to write can run, and the installation of simple configuration, basically a network of the environment Pip install all done, no net, the WHL package copy came over a line of command also resolved ( Windows python3.5 uses PIP to install the WHL package offline.
This blog is for the use of Python3 under the Windows platform (Python2 community will stop support, using 3 is the trend), read the data in the XLS,XLSX format to clean the library to do a small example.
Preliminary Business Process
The entire business process is simple: two big steps
1. 读取xlsx数据进行清洗2. cx_Oracle批量入库
To build a table statement:
create table temp_table(importtime varchar2(128),carrier varchar2(32),);select * from temp_table
An example script:
#-*-Coding:utf-8-*-ImportXlrdImportDatetimeImportCx_oracleImportTime fromItertoolsImportIsliceImportosos.environ[' Nls_lang ']=' simplified Chinese_china. ZHS16GBK 'Linename = [' line Line 1 ',' line Line 2 ']stationname = []####################### #链接数据库相关 ###################################### def getconnoracle(username,password,ip,service_name): Try: conn = Cx_oracle.connect (username+'/'+password+' @ '+ip+'/'+service_name)# Connect to database returnConnexceptException:print (Exception)###################### #进行数据批量插入 ####################### def insertoracle(conn,data,input_file_name):Sheetnumber = getsheetnumber (data) cursor = Conn.cursor ()Try: forXinchRange0, sheetnumber): Templist = Excel_table_byindex (Input_file_name,0, x) Cursor.prepare (' INSERT into temp_table (Importtime, Carrier) VALUES (: 1,:2) ')# Use cursor for various operations, templist values need to correspond to table temp_tableCursor.executemany (None, templist) Conn.commit ()exceptCx_oracle.databaseerror asMsg:print (msg)finally: Cursor.close () Conn.close ()########################## #打开excel文件 ######################## def openxls(path): Try: data = Xlrd.open_workbook (path)returnDataexceptException:print (Exception) def getsheetnumber(data):Sheet_num = Len (Data.sheets ())returnSheet_num###################### #一些数据清洗工作 ######################## def getlinename(str): forXinchLinename:ifXinchStr:returnX def getstationname(str): forXinchStationName:ifXinchStr:returnX######### #将excel中除去表头的一个sheet读出来, returns a list############# def excel_table_byindex(Path,colnameindex = 0, By_index = 0):Today = Time.strftime ('%y%m%d ', Time.localtime (Time.time ())) data = Openxls (path) table = Data.sheets () [By_index] nrows = table.nrows ncols = Table.ncols colnames = table.row_values (colnameindex) list = [] forRowNuminchRange1, nrows): row = Table.row_values (rownum) temp_linename = Getlinename (row[6]) Temp_stationname = Getstationname (row[6])ifRow:app = [Today, str (row[1]), str (row[2]), Temp_stationname,temp_linename]# for I in range (len (colnames)): # App[colnames[i]] = Row[i]List.append (APP)returnList################## #一个可以从文件第二行开始读的办法 ############# def getallstationname(path):Stationname_file = open (path,' R ', encoding=' Utf-8 ')#count = Len (Stationname_file.readlines ()) forLineinchIslice (Stationname_file,1,None): Str_temp = Line.strip (' \ n ')ifStr_temp not inchLinename andStr_temp! ='----' andstr_temp!="': Stationname.append (Str_temp)#################################################################### def getstationnamefromexcel(path):data = Openxls (path) table = Data.sheets () [0] Nrows = table.nrows Ncols = Table.ncols colnames = table.row_values (0) List = [] forRowNuminchRange0, nrows): row = Table.row_values (rownum) [0]ifRow:list.append (Row)returnList################################################################# def main():Username =' xx 'Password =' xx 'IP =' 192.168.1.1 'Service_Name =' IOP ' #获取数据库链接conn = Getconnoracle (username,password,ip,service_name) Input_file_name = (r "E:\code\python\findS\subwayBase\xx.xlsx")#output_file_name = input ("Enter the output file name:")Getallstationname (r "E:\code\python\findS\subwayBase\ site. txt") begin = Datetime.datetime.now () insertoracle (Conn,openxls (input_file_name), input_file_name)# X.fetchone () # c.close () # Close cursor # conn.close () # Close ConnectionEnd = Datetime.datetime.now () print ((End-begin). seconds)if__name__ = =' __main__ ': Main ()
Python3 using Cx_oracle to operate Oracle error issues under Windows
The error message is as follows:
Traceback (most recent):File "e:/code/python/finds/findsubwaybase.py", line134,inch<module> Main ()File "e:/code/python/finds/findsubwaybase.py", line124,inchMain Insertoracle (Conn,openxls (input_file_name), input_file_name)File "e:/code/python/finds/findsubwaybase.py", line +,inchInsertoracle Cursor.executemany (none,templist) Unicodeencodeerror:' ASCII' Codec can' tEncode charactersinchPosition1-6: Ordinal not inch Range( -)ProcessFinished with ExitCode1
When using Python3 's cx_oracle to operate Oracle data, it is unavoidable to encounter Chinese coding problems, of course, Internet search is all python2, the solution is:
#在开头加上import"utf-8" )
The solution in Python3 is: plus the core code
import osos.environ[‘NLS_LANG‘]=‘SIMPLIFIED CHINESE_CHINA.ZHS16GBK‘
OK, is actually set up the client code, reference: Python code OS. Environ detailed
Xlrd manipulating Excel
Demo Code:
#获取一个工作表Table = Data.sheets () [0]#通过索引顺序获取Table = Data.sheet_by_index (0)#通过索引顺序获取Table = Data.sheet_by_name (u ' Sheet1 ')#通过名称获取#获取整行和整列的值 (Array)Table.row_values (i) table.col_values (i)#获取行数和列数nrows = Table.nrowsncols = Table.ncols#循环行列表数据 forIinchRange (nrows):PrintTable.row_values (i)#单元格CELL_A1 = Table.cell (0,0). Valuecell_c4 = Table.cell (2,3). Value#使用行列索引CELL_A1 = Table.row (0)[0].VALUECELL_A2 = Table.col (1)[0].value#简单的写入row =0Col =0# type 0 empty,1 string, 2 number, 3 date, 4 Boolean, 5 errorCType =1Value =' cell value 'XF =0 # Extended FormattingTable.put_cell (Row, col, CType, value, XF) Table.cell (0,0)#单元格的值 'Table.cell (0,0). Value#单元格的值 '
Reference links
[OS.ENVIRON详解]: http://blog.csdn.net/junweifan/article/details/7615591[python编码]:http://www.cnblogs.com/fkissx/p/5417363.html
Highly recommended again, proficient in Oracle+python series: official documentation
Http://www.oracle.com/technetwork/cn/articles/dsl/mastering-oracle-python-1391323-zhs.html
Windows Python3 using CX_ORACLE,XLRD plugin for Excel data cleaning input