Windows Python3 using CX_ORACLE,XLRD plugin for Excel data cleaning input

Source: Internet
Author: User

We are doing data analysis, the process of cleaning, many times will face a variety of data sources, to the different data sources for cleaning, warehousing work. Of course, Python this language, I prefer, the development of high efficiency, basically how to write can run, and the installation of simple configuration, basically a network of the environment Pip install all done, no net, the WHL package copy came over a line of command also resolved ( Windows python3.5 uses PIP to install the WHL package offline.

This blog is for the use of Python3 under the Windows platform (Python2 community will stop support, using 3 is the trend), read the data in the XLS,XLSX format to clean the library to do a small example.

Preliminary Business Process

The entire business process is simple: two big steps

1. 读取xlsx数据进行清洗2. cx_Oracle批量入库


To build a table statement:

create table temp_table(importtime varchar2(128),carrier varchar2(32),);select * from temp_table

An example script:

#-*-Coding:utf-8-*-ImportXlrdImportDatetimeImportCx_oracleImportTime fromItertoolsImportIsliceImportosos.environ[' Nls_lang ']=' simplified Chinese_china. ZHS16GBK 'Linename = [' line Line 1 ',' line Line 2 ']stationname = []####################### #链接数据库相关 ###################################### def getconnoracle(username,password,ip,service_name):    Try: conn = Cx_oracle.connect (username+'/'+password+' @ '+ip+'/'+service_name)# Connect to database        returnConnexceptException:print (Exception)###################### #进行数据批量插入 ####################### def insertoracle(conn,data,input_file_name):Sheetnumber = getsheetnumber (data) cursor = Conn.cursor ()Try: forXinchRange0, sheetnumber): Templist = Excel_table_byindex (Input_file_name,0, x) Cursor.prepare (' INSERT into temp_table (Importtime, Carrier) VALUES (: 1,:2) ')# Use cursor for various operations, templist values need to correspond to table temp_tableCursor.executemany (None, templist) Conn.commit ()exceptCx_oracle.databaseerror asMsg:print (msg)finally: Cursor.close () Conn.close ()########################## #打开excel文件 ######################## def openxls(path):    Try: data = Xlrd.open_workbook (path)returnDataexceptException:print (Exception) def getsheetnumber(data):Sheet_num = Len (Data.sheets ())returnSheet_num###################### #一些数据清洗工作 ######################## def getlinename(str):     forXinchLinename:ifXinchStr:returnX def getstationname(str):     forXinchStationName:ifXinchStr:returnX######### #将excel中除去表头的一个sheet读出来, returns a list############# def excel_table_byindex(Path,colnameindex = 0, By_index = 0):Today = Time.strftime ('%y%m%d ', Time.localtime (Time.time ())) data = Openxls (path) table = Data.sheets () [By_index] nrows = table.nrows ncols = Table.ncols colnames = table.row_values (colnameindex) list = [] forRowNuminchRange1, nrows): row = Table.row_values (rownum) temp_linename = Getlinename (row[6]) Temp_stationname = Getstationname (row[6])ifRow:app = [Today, str (row[1]), str (row[2]), Temp_stationname,temp_linename]# for I in range (len (colnames)):            # App[colnames[i]] = Row[i]List.append (APP)returnList################## #一个可以从文件第二行开始读的办法 ############# def getallstationname(path):Stationname_file = open (path,' R ', encoding=' Utf-8 ')#count = Len (Stationname_file.readlines ())     forLineinchIslice (Stationname_file,1,None): Str_temp = Line.strip (' \ n ')ifStr_temp not inchLinename andStr_temp! ='----' andstr_temp!="': Stationname.append (Str_temp)#################################################################### def getstationnamefromexcel(path):data = Openxls (path) table = Data.sheets () [0] Nrows = table.nrows Ncols = Table.ncols colnames = table.row_values (0) List = [] forRowNuminchRange0, nrows): row = Table.row_values (rownum) [0]ifRow:list.append (Row)returnList################################################################# def main():Username =' xx 'Password =' xx 'IP =' 192.168.1.1 'Service_Name =' IOP '    #获取数据库链接conn = Getconnoracle (username,password,ip,service_name) Input_file_name = (r "E:\code\python\findS\subwayBase\xx.xlsx")#output_file_name = input ("Enter the output file name:")Getallstationname (r "E:\code\python\findS\subwayBase\ site. txt") begin = Datetime.datetime.now () insertoracle (Conn,openxls (input_file_name), input_file_name)# X.fetchone ()    # c.close () # Close cursor    # conn.close () # Close ConnectionEnd = Datetime.datetime.now () print ((End-begin). seconds)if__name__ = =' __main__ ': Main ()
Python3 using Cx_oracle to operate Oracle error issues under Windows

The error message is as follows:

Traceback (most recent):File "e:/code/python/finds/findsubwaybase.py", line134,inch<module> Main ()File "e:/code/python/finds/findsubwaybase.py", line124,inchMain Insertoracle (Conn,openxls (input_file_name), input_file_name)File "e:/code/python/finds/findsubwaybase.py", line +,inchInsertoracle Cursor.executemany (none,templist) Unicodeencodeerror:' ASCII' Codec can' tEncode charactersinchPosition1-6: Ordinal not inch Range( -)ProcessFinished with ExitCode1

When using Python3 's cx_oracle to operate Oracle data, it is unavoidable to encounter Chinese coding problems, of course, Internet search is all python2, the solution is:

#在开头加上import"utf-8" )

The solution in Python3 is: plus the core code

import osos.environ[‘NLS_LANG‘]=‘SIMPLIFIED CHINESE_CHINA.ZHS16GBK‘

OK, is actually set up the client code, reference: Python code OS. Environ detailed

Xlrd manipulating Excel

Demo Code:

#获取一个工作表Table = Data.sheets () [0]#通过索引顺序获取Table = Data.sheet_by_index (0)#通过索引顺序获取Table = Data.sheet_by_name (u ' Sheet1 ')#通过名称获取#获取整行和整列的值 (Array)Table.row_values (i) table.col_values (i)#获取行数和列数nrows = Table.nrowsncols = Table.ncols#循环行列表数据 forIinchRange (nrows):PrintTable.row_values (i)#单元格CELL_A1 = Table.cell (0,0). Valuecell_c4 = Table.cell (2,3). Value#使用行列索引CELL_A1 = Table.row (0)[0].VALUECELL_A2 = Table.col (1)[0].value#简单的写入row =0Col =0# type 0 empty,1 string, 2 number, 3 date, 4 Boolean, 5 errorCType =1Value =' cell value 'XF =0 # Extended FormattingTable.put_cell (Row, col, CType, value, XF) Table.cell (0,0)#单元格的值 'Table.cell (0,0). Value#单元格的值 '
Reference links
[OS.ENVIRON详解]: http://blog.csdn.net/junweifan/article/details/7615591[python编码]:http://www.cnblogs.com/fkissx/p/5417363.html

Highly recommended again, proficient in Oracle+python series: official documentation
Http://www.oracle.com/technetwork/cn/articles/dsl/mastering-oracle-python-1391323-zhs.html

Windows Python3 using CX_ORACLE,XLRD plugin for Excel data cleaning input

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.