Some time ago I made a small project and helped a professor in the Marine Department to do a data processing software. The basic function is very simple, that is, filtering and counting some data in Excel, and merging the content of multiple tables. I have no experience in processing Excel data before, and even seldom use Excel myself. I remember that in the last chapter of "Python core programming", I talked about using Win32 COM to operate the office. I didn't quite understand what I was talking about. Google finds many modules that can process excel Data. After comparison, openpyxl is selected. openpyxl is used to process xlsx files generated by Excel2007 and later versions. Unfortunately, xls and xlsx are both in the data I got. However, conversion is not difficult. For the moment, ignore this problem.
The module installation process is very simple. There are simple instructions and API documents on the official website. It is easy to use and can basically meet my needs. For Excel files, all I need is to read the data from the corresponding location and write the data to the corresponding location. In the meantime, data processing can be easily done through python.
1. Type and organization of Excel Data
Openpyxl defines multiple data formats. I only cover the three most important data formats:
NULL: NULL value. It corresponds to None in python, indicating that there is no data in the cell.
Numberic: numeric type, which is processed by floating point. It corresponds to the float in python.
String: string type, corresponding to unicode in python.
Each Excel data file has three levels of objects from top to bottom:
Workbook: each Excel file is a workbook.
Sheet: Each workbook can contain multiple sheet, which corresponds to "sheet1" and "sheet2" in the lower left corner of the Excel file.
Cell: each sheet is a table we usually see. it can contain m rows, n columns, and each row number is a cell.
2. read data from Excel
Read data from an existing xlsx file and follow the three levels of the Excel file to perform the following three steps:
1. Open workbook:
From openpyxl import load_workbook
Wb = load_workbook('file_name.xlsx ') 2. Open the desired sheet:
We can open the first sheet in the workbook as follows:
Ws = wb. get_active_sheet () Variable _ active_sheet_index is used to determine which sheet to obtain. By default, it is set to 0.
You can also use the sheet name to obtain the sheet:
Ws = wb. get_sheet_by_name ("sheet_name") openpyxl does not seem to provide the ability to read sheet by index, but it is always easy to implement it through sheet_name:
Sheet_names = wb. get_sheet_names ()
Ws = wb. get_sheet_by_name (sheet_names [index]) 3. Obtain the value of the corresponding cell:
Openpyxl provides two methods to read cells. One is to use letters to represent column numbers according to the Excel conventions. For example:
C = ws. cell ('a4 '). value can also be read according to the row number column:
D = ws. cell (row = 3, column = 0) 3. Write Data to Excel
Similar to reading, writing an Excel file involves four steps:
1. Create a workbook
Directly create a workbook object:
Wb = Workbook () 2. Create a sheet
By default, the newly created sheet is placed at the bottom. To create a sheet at the top of the list, you can input the parameter 0:
Ws1 = wb. create_sheet () # insert at the end
Ws2 = wb. create_sheet (0) # insert at the first position3. write data
Assign the value of the data to be written to the corresponding cell, if only a table is modified. You can skip the first two steps, but do not overwrite other data:
Ws. cell ('b5 ') = value1
Ws. cell (row = 3, column = 7) = value24. save data
Finally, write the data. Note that if there is a file with the same name under the Save path, the previous file will be overwritten:
Wb.save('file_name.xlsx ')