Comparison of how to read and write Excel files in Python

Source: Internet
Author: User

Although working with data every day, and frequently using Excel to do some simple data processing and display, but has long been careful to avoid using Python directly read and write Excel files. I usually save the data as a tab-separated text file (TSV), import it in Excel, or copy and paste it directly.

The previous time to do a project, but have to use Python directly generated Excel files, and then as the requirements change, but also to the existing Excel file to read. In this process, research and try some tools, also took some detours. The next time you have a similar demand, you don't have to search the sky.

Super Invincible Big PK

I've mostly tried four tools, and I'm not going to give them rankings, because the choices you make are different in different scenarios.

Xlsxwriter

Xlsxwriter is the tool that I finally chose to use for the write operation. As the name implies, it can only be used to write files.

This should be a relatively new project, and the earliest submission on GitHub is in January 2013. Its official documentation declares that it supports:

Advantages

First, the function is relatively strong

In contrast, this is the most powerful tool in addition to Excel itself. I used it for example: font settings, foreground background color, border settings, view zoom (zoom), cell merge, AutoFilter, freeze panes, formula, data validation, cell comment, row height and column width settings, and more.

What surprises me most is that the Excel file generated with the cell comment, whether it's Excel 2007 or Excel 2013, opens normally (as mentioned below, this task cannot be done by Excel itself).

Second, support large file write

If the amount of data is very large, you can enable the constant memory mode, which is a sequential write mode, a row of data is immediately written to a row, not all the data is kept in memory.

Disadvantages

One, does not support read and modify

The author does not intend to do a xlsxreader to provide read operations. Can not be read, there is no way to modify. It can only be used to create a new file. I use XLRD to read the required information and create a new file with Xlsxwriter.

In addition, even if you create to half of the Excel file, it is impossible to read the created content (the information should be in, but there is no corresponding interface). Because its primary method is write rather than set. When you write data in a cell, there is no way to read the information you have written unless you save the relevant content yourself. From this point of view, you can't read---------write-write--------writes written by write-only.

Second, does not support XLS file

XLS is a format used by Office 2013 or earlier, and is a binary-formatted file. XLSX is a compressed package that consists of a series of XML files (the last X represents XML). If you do not want to create a lower version of the XLS file, please go to XLWT Bar.

Third, temporarily do not support pivot table (pivot tables)

A pivot table is a cumbersome thing, and requires a set of data caches in addition to its own complex structure. I made this request to the author, but it's a very difficult function to fully implement, and we look forward to it.

Xlrd&xlwt

My program in the first edition, use XLWT to create an XLS file, and then convert it to an xlsx file through the Microsoft Excel API and write to the Advanced Data Validation (Excel 2007 data Validation than Excel 2003 to be a lot more powerful) and cell comments.

The final version of my program still uses XLRD to read the required information from the existing file.

XLRD&XLWT is primarily for the XLS file format for Office 2013 or earlier versions.

Advantages

First, the XLS format is supported

Xlsxwriter and OPENPYXL do not support the XLS format, from this point of view, XLRD&XLWT still have some irreplaceable.

Disadvantages

First, the support for xlsx is poor

Currently XLRD has been able to read xlsx files with limited support. As for XLWT, I have not tried it.

Second, the support for the change is poor

Xlrd and XLWT are two relatively independent modules, although Xlutils provides a way to help you put xlrd. The book object is copied to the Xlwt.workbook object, but is similar to Xlsxwriter, which simply provides the Write method, which makes it impossible for you to easily get the data that is currently written and make targeted modifications. If you have to do this, you have to keep saving and then use the new xlrd. The book object reads the information you want, or it's more troublesome.

Third, the function is very weak

In addition to the most basic writing data and formulas, XLWT provides very little functionality (Excel 2013 itself supports very few features). The same is true for reading, and a lot of information is lost when read.

Openpyxl

OPENPYXL is a relatively comprehensive tool, able to read and write can be modified, the function can be but also have a great flaw. When I was in the middle version I was going to rely on it completely, but then I found a serious problem and gave it up.

Advantages

One, can read can write can modify

The working mode of OPENPYXL is very different from xlsxwriter and XLWT, it uses getter/setter mode. You can read the contents of a cell at any time and modify it according to its content, and OPENPYXL will help you remember the status of each cell.

In particular, it is important to note that while it supports modifying existing files, because of the limited functionality it supports, the content that it does not support is ignored when the file is read, and is lost when it is written. So be careful when you use it. For example, the following shortcomings mentioned that it could not read the formula, if you modify a file with a formula, after saving, all the formulas are gone.

Second, the function can also be

Overall, it supports functions between Xlsxwriter and XLWT.

Disadvantages

One, does not support XLS

This matter can only let xlrd and XLWT to do.

Second, the reading formula is not supported

This is actually a not very simple thing, although I have not tried, but I believe xlrd also do not do this thing.

Excel cell If it is a formula, it will also save the cache of the formula itself and the result of the operation. Read the cell contents with OPENPYXL, it will not tell you what the cell's formula is, or even tell you that the cell is a formula, it will only get the results of this cache operation. I was going to use it to determine if the cell was using a formula, and then to make a different deal. The result met this problem, finally had to take the other alternative way to do.

Microsoft Excel API

Developers of most Windows environments will choose the Microsoft Excel API. In fact, not just python, almost every language has its own way of using it, because the core logic is entirely provided by Microsft Excel itself. The language-related section is only responsible for communicating with Windows COM components.

In Python, you first need to install Python for Windows extensions (PYWIN32), and you can consult Win32 modules and Python COM for specific documents.

Of course you also have to install a version of Microsoft Office Excel, and its internal DLLs are responsible for the actual operation.

Advantages

One, the biggest advantage: strong No Limit

Because it communicates directly with the Excel process, you can do anything you can do in Excel.

Second, the document is rich

The documentation on MSDN is definitely the best document in the world. Not one.

Third, easy to debug

You can simply use the macro in Excel to debug the effect you want first. Even if you don't know how to do something with a program, you can get the processing code for that action by means of a macro recording.

Disadvantages

One, fatal disadvantage: slow to death

Because of the need to communicate with the Excel process, its efficiency is very low.

If you make the Excel window visible, as the program runs, you can see the changes that each sentence brings, and the contents of the cell change one by one. If there is a lot of data to write, the speed is intolerable.

Second, platform restrictions

There are no methods that can be used on non-Windows systems at this time.

In addition, depending on what the program can do, it relies heavily on the version of Excel that is currently installed on the system. There are significant differences in the functionality of different versions, and there are differences in APIs. It will be more troublesome to use.

Iii. problems caused by Excel's own bugs

I just found one of them, which has nothing to do with Python and can be reproduced manually in Excel. Create a random file in Excel 2007, add a comment to a cell, save it. Change your computer, open it with Excel 2013, and you'll get an error, and the comment disappears.

Also if your program creates an Excel file with comments on a machine with Excel 2007, the file will be opened in Excel 2013 with an error and no comments. The reverse is the same.

About initialization

The specifics of the COM interface for Excel I will not introduce, you need to directly consult the relevant MSDN documentation. Here are just a few special little questions.

There are two common ways to get an Excel object that you can manipulate:

The difference is that the dispatch method tries to find and reuse an existing Excel process (such as an Excel program that you are already running), and Dispatchex must create a new Excel process. In general, the former can be used, but also to save some resources of the cost. But there are some problems, some states are shared within an Excel process, and your actions within other windows of the same process may affect the processing that the Python program is going through, resulting in various errors. For example, when you manually open an Excel window where a cell is in the editing state, most of the operations that Python controls can fail (even if it is working on another file) because two cells cannot be edited at the same time in an Excel process.

In order to avoid trouble, I usually use the Dispatchex method.

about Windows visible

You can make the newly launched Excel process window visible, as you can see by double-clicking the icon on the desktop, and every step that the program controls. You can also do manual operations at the same time, but once you do this, it's likely that your Python program will crash.

Window is not visible also brings some trouble, said earlier, the Excel process launched through Python is no different from the Excel process you open directly from the desktop, in the process of using Excel, we often encounter various pop-up errors, warnings or prompt boxes, These can also be encountered when working with Python. Especially if your program is not fully debugged.

I generally let the program-controlled Excel process be visible during debugging, not visible when formally used, by a command like this (suppose you have a variable called Is_debug that records whether you are currently in debug State):

About saving and overwriting an existing file

The details of opening and saving the file are not here to say more, you can view the relevant API introduction in MSDN, very detailed. Here's a little bit of a save, if the target file already exists. The Excel API Save as method does not seem to provide parameters to determine whether to overwrite the target file with the same name directly, in the window operation, this will pop up a confirmation box to let the user decide. Our program certainly does not want to do this, in fact, if you follow the above to make the window invisible, you will not see the popup window.

You can close the DisplayAlert property so that Excel does not pop up the confirmation window, but overwrites the file with the same name directly.

About ending an Excel process

Process is a kind of resource, we apply for resources, we must release them after we have exhausted them. Especially if you hide the Excel window, the user will not be able to close the process you have opened until they view the system process.

However, an Excel process can open multiple files at the same time, which may be open to other parts of your program, or it may be opened by the user himself. This way you can't end the Excel process at will, or it can affect the actions of other people or programs.

I usually after the completion of my processing (close the Excel file I opened or created), determine whether the current Excel process is still open other documents, if not, will end the process.

About enumeration constants

There are a variety of enumeration constants in the Excel API, and I have not found a way to refer to them directly in Python, and the current approach is to find the values of the constants you need and define them yourself. For example, I used the following enumeration constants:

To know the numeric value of an enumeration constant, you can review the data related to Excel enumerations in MSDN.

"July 31, 2014 Update" Thanks @ Evian reminder, in Python can also directly reference the relevant constants, that is, by win32com.client.constants to get the value of the constant. But here is a bit more tricky, if you directly use dispatch or Dispatchex to get the Excel object, it is impossible to remove the constant value from the constants, you need to manually run Makepy, Or get an Excel object by Win32com.client.gencache.EnsureDispatch:

                

From http://www.raincent.com/content-10-10533-1.html

Comparison of how to read and write Excel files in Python

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.