Use Python + OPENPYXL to process excel2007 document ideas and tips

Use Python + OPENPYXL to process excel2007 document ideas and tips _python

Last Update:2017-01-19 Source: Internet

Author: User

Tags printable characters python list

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Search tools

The first step in determining a task is to find a hand in the library to work. Python Excel lists the packages xlrd, XLWT, Xlutils, but

They're older, and XLWT don't even support Excel after version 07.
Their documents are not very friendly, may need to read the source code, and elder sister's task is relatively tight, plus I was at the end of the period, do not have this time to read the source code
After another search I found OPENPYXL, support 07+ Excel, has been maintained, the document is clear and easy to read, referencing tutorial and API documents can soon start, that is it ~

Installation

This is easy, direct pip install OPENPYXL, oh Oh ~

Because I do not need to deal with the picture, there is no install pillow.

Some considerations

The source file is about one in the 1~2MB, which is smaller, so you can read it directly into the memory processing.
Since it is to deal with Excel, not to mention that their entire group is obviously win under the work (data are stored in Excel = =, business people ah ...). , this script is still under win.
This task does not require me to make any changes to the existing files! Be embarrassed to do ... All I have to do is read, process, and write another file.

Learn to use

Well, just open cmd, and then use the Python shell to play this module to get started ... (Win under no outfit Ipython, embarrassed)

To do this little script basically I just need to import two things

From OPENPYXL Import workbook from
OPENPYXL import Load_workbook

Load_workbook as the name suggests is to import files into memory, workbook is the most basic of a class, used in memory to create files last written to disk.

Work

First I need to import this file

INWB = Load_workbook (filename)

is to get a workbook object

Then I need to create a new file

OUTWB = Workbook ()

Then in this new file, use Create_sheet to create a few new worksheets, such as

Careersheet = Outwb.create_sheet (0, ' career ')

You'll insert a worksheet called career from the head (that is, use a python list insert)

Next I need to walk through each worksheet of the input file and do some work with the table name (e.g. if the table name is not a number, I do not need to deal with it), OPENPYXL supports getting the worksheet by the table name in the same way as a dictionary, and getting the table name of a workbook is Get_sheet_names

For SheetName in Inwb.get_sheet_names ():
  if not sheetname.isdigit ():
    continue
  sheet = Inwb[sheetname]

After you get the worksheet, you're done by columns and rows. OPENPYXL determines the number of rows and columns based on the area in which the worksheet actually has data, and the way to get rows and columns is sheet.rows and sheet.columns, which can be used like a list. For example, if I want to skip a table with less than 2 columns of data, you can write

If Len (Sheet.columns) < 2:
  continue

If I want to get the first two columns of this worksheet, I can write

Cola, colb = Sheet.columns[:2]

In addition to using columns and rows to get the list of worksheets, you can also use Excel's cell code to get an area, such as

Cells = sheet[' A1 ': ' B20 ']

A bit like Excel's own function, you can pull out a two-dimensional area ~

In order to facilitate processing, encountered a worksheet without C column, I want to create a and a column, such as Long empty C column out, then I can use Sheet.cell this method, by passing in the cell number and add null value to create a new column.

Alen = Len (cola)
for I in range (1, Alen + 1):
  Sheet.cell (' c%s '% (i)). Value = None

Note: Excel's cell name starts at 1.

The above code also shows that the value of getting a cell is cell.value (either a left or a right), its type can be a string, floating-point number, integer, or Time (Datetime.datetime), and the corresponding type of data is generated in the Excel file.

After you get the value of each cell, you can do it. ~OPENPYXL automatically encodes the string in Unicode, so the strings are Unicode types.

In addition to using Cell.value to modify values individually, you can also line append to the worksheet

Sheet.append (Stra, Dateb, NUMC)

Finally, when the new file is written, just save it with Workbook.save.

Outwb.save ("Test.xlsx")

This will overwrite the current file, even the one you read to memory before.

Some place to look out for.
If you want to get the subscript for the current cell in this Column object while traversing each cell in a row

For IDX, cell in Enumerate (COLA):
  # do something ...

To prevent the data from getting an invisible space at both ends (a common pit in an Excel file), remember the Strip ()

If the cell in the worksheet does not have data, OPENPYXL will leave it with a value of none, so if you want to do it based on the value of the cell, you cannot presuppose its type, most useful

If not cell.value
  continue

And the like, to judge first.

If you have a lot of noise in the Excel file that you want to work with, such as when you expect a cell to be a time, some of the table's data may be strings, and you can use

If Isinstance (Cell.value, Unicode): Break

such as the statement processing.

Win under the CMD seems not very good to set the code page with Utf-8, if it is Simplified Chinese words can be used 936 (GBK), print automatically from Unicode to GBK output to the terminal.

Some small functions to help deal with Chinese problems
I am dealing with some of the tables beyond the GBK range of characters, when I need to print some information to monitor the progress of the time is very troublesome, but they can be ignored, I directly replace with a space and then print also line, so add some I would have to replace the separator, I can:

# annoying seperators
dot = u ' \u00b7 ' dash = U ' \u2014 ' emph = U ' \u2022 ' dot2 =
u ' \u2027 ' seps

= (U '. ') , Dot, Dash, emph, Dot2)

def get_clean_ch_string (chstring): "" "
  Remove annoying seperators from the Chinese string .

  Usage:
    cleanstring = get_clean_ch_string (chstring) "" "
  cleanstring = chstring for
  Sep in SEPs:
    cleanstring = cleanstring.replace (Sep, u ') return
  cleanstring

In addition, I have a demand, is the English name [space] Chinese into English surname, English name, Chinese surname, Chinese name.

First I need to be able to split English and Chinese, my approach is to use a regular match, according to the common Chinese and English characters in the range of Unicode to set. The regular pattern matching English and Chinese is as follows:

# regex pattern matching all ASCII characters
asciipattern = ur ' [%s]+ '% '] Join (CHR (i) for I in range (127))
# Regex pattern matching all common Chinese characters and seporators
chinesepattern = ur ' [\u4e00-\u9fff.%s]+ '% ('. J Oin (SEPs))

English is replaced with the range of ASCII printable characters, the common Chinese character range is \u4e00-\u9fff, the SEPs is the previous mentioned above GBK range of some characters. In addition to the simple segmentation, I still need to deal with only Chinese names without English names, only English names without Chinese names, and so on, the logic of Judgment is as follows:

def split_name (name): "" "
  split [中文版 name, Chinese name].

    If one of them is missing, None would be returned instead.
  Usage:
    engname, chname = split_name (name) ""
  "
  matches = Re.match (' (%s) (%s) '% (Asciipattern, Chinesepattern), name)
  if matches: # 中文版 name + Chinese name return
    matches.group (1). Strip (), Matches.group ( 2). Strip ()
  else:
    matches = Re.findall (' (%s) '% (Chinesepattern), name)
    matches = '. Join (matches). Strip ()
    If matches: # Chinese name only return
      None, matches
    else: # 中文版 name only
      matches = Re.findall (' (%s) '% (Asciipattern), name)
      Return '. Join (matches). Strip (), None

After getting the Chinese name, I need to split the first name and name, because the task requirements do not need to split the name is very clear, I will be in accordance with the common Chinese name of the name division--two words or three words is a surname, four characters of the top two words is the surname, The name-delimited (ethnic-minority name) separator is preceded by the last name (the previous get_clean_ch_string function is used to remove the separator), and the name is longer without a delimiter, assuming the entire string is a name. (Note that the first name in English refers to the name, and last name refers to the surname, 2333)

 def split_ch_name (chname): "" "split the Chinese name into the the name of
    * If The name is XY or XYZ, X'll be returned as the last name.
    * If The name is WXYZ, WX would be returned as the last name. * If the name is ...
    WXYZ, the whole name is returned as the last name. * If the name is..
  ABC * XYZ ..., the part before the seperator is returned as the last name. Usage:chfirstname, Chlastname = Split_ch_name (chname) "" "If Len (Chname) < 4: # XY or XYZ chlastname = ChN Ame[0] Chfirstname = chname[1:] elif len (chname) = = 4: # WXYZ chlastname = chname[:2] Chfirstname = chname[2 :] Else: # longer cleanname = get_clean_ch_string (chname) nameparts = Cleanname.split () print U '. Join (name Parts) If Len (Nameparts) < 2: # ... WXYZ return None, Nameparts[0] chlastname, chfirstname = Nameparts[:2] #. ABC * XYZ ... return chfirstname, Chlastname

The

Split English name is very simple, the space is separated, the first part is the name, the second part is the surname, the other situation temporarily does not matter.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More