In Windows, Python uses the Pandas module to Operate Excel files,

Source: Internet
Author: User

In Windows, Python uses the Pandas module to Operate Excel files,

Install the Python Environment
ANACONDA is a Python release version that contains more than 400 most commonly used Python libraries, including Numpy and Pandas that are frequently used in data analysis. More importantly, it can be installed on any platform with one click. The environment is automatically configured without any additional operations and is very convenient. Therefore, to install the Python environment, you only need to download the installation file from the ANACONDA website and double-click the installation file.
Https://www.continuum.io/downloads ANACONDA official
After the installation is complete, use the windows + r key to open the Windows command line, enter ipython, and then press Enter:

Python 2.7.11 |Anaconda 2.5.0 (64-bit)| (default, Jan 29 2016, 14:26:21) [MSC v.1500 64 bit (AMD64)]Type "copyright", "credits" or "license" for more information.IPython 4.0.3 -- An enhanced Interactive Python.?     -> Introduction and overview of IPython's features.%quickref -> Quick reference.help   -> Python's own help system.object?  -> Details about 'object', use 'object??' for extra details.

As shown above, if you successfully enter the Python environment, the installation is successful.


Use pandas to process excel
In this document, we use the log.xls file as an example to describe how to use pandas to process excel files. The content of the log.xls file is as follows:

We can see that log.xls contains multiple tables (sheet). Each table mainly includes two attributes: member and activity. Data of the same member ID are arranged together, and the last row is the summary of all the previous rows. Here, we assume that we need to extract the summary information of each member, that is, to extract the data of the last row that appears continuously for each member. How to implement it?

Here, we will introduce Pandas in a grand way, starting from https://pypi.python.org/pypi/pandas/0.16.2official download, and the command line will go to the download and decompress directory,

python setup.py install 

Install.

(PS: python's big data module pandas depends on
• NumPy: 1.6.1 or higher
• Python-dateutil 1.5
• Pytz
These three modules may need to be installed.
Optional dependencies available (can be installed as needed)
• SciPy: miscellaneous statistical functions
• PyTables: necessary for HDF5-based storage
• SQLAlchemy: for SQL database support. Version 0.8.1 or higher recommended.
• Matplotlib: for plotid
• Statsmodels
-Needed for parts of pandas. stats
• Openpyxl, xlrd/xlwt
-Openpyxl version 1.6.1 or higher, but lower than 2.0.0
-Needed for Excel I/O
• XlsxWriter
-Alternative Excel writer.
• Boto: necessary for Amazon S3 access.

)

Import the pandas module:

import pandas as pd

Import is used to read the pandas module, and pd stands for it for convenience.

Read the excel file to be processed:

df = pd.read_excel('log.xls')

You can use the read_excel function to read an excel file, which must be replaced with the path of the excel file. After reading the data, it becomes the DataFrame object of pandas. DataFrame is a column-oriented (column-oriented) two-dimensional table structure that contains lists and Row Labels. Operations on excel files are converted to operations on DataFrame. In addition, if an excel file contains multiple tables, you can:

df = pd.read_excel('log.xls', sheetname=1)

A sheetname parameter is added to indicate the tables whose count starts from 0. I set 1 above, that is, the second table.

After reading the data, you can first view the header information and the data type of each column:

df.dtypes
The output is as follows:
Member objectUnnamed: 1 float64Unnamed: 2 float64Unnamed: 3 float64Unnamed: 4 float64Unnamed: 5 float64 internal and external activity types objectUnnamed: 7 objectactivity objectdtype: object

Extract the last row of data that appears consecutively for each member:

 new_df = df.drop_duplicates(subset='Member', keep='last')

The preceding statement removes redundant rows based on the Member field and retains the last row of data for the same row. The data of the last row of each member is obtained, and the filtered DataFrame is returned.

Next, save the processed result as an excel file:

out = pd.ExcelWriter('output.xls')new_df.to_excel(out)out.save()

Output.xls is the name of the file you want to save. You can retrieve it. Then, save the DataFrame content to the file, and finally save the file to the system disk.

Next, you can see a new file in the current directory, you can directly open it in excel.

Pandas also provides many APIs. You can find API documentation based on specific tasks and find appropriate functions to complete tasks.

Appendix: a complete example

# Coding = utf-8import pandas as pd # Read the 2nd tables in the excel file df = pd.read_excel('log.xls ', sheetname = 1) # view the table's Data Type print df. dtypes # view the data in the Member column print df ['member'] ''' # create a new column, the values of each row are the same row values of the Member and activity columns and for I in df. index: df ['activity _ 2'] [I] = df ['member'] [I] + df ['active'] [I] ''' # Remove it based on the Member Field redundant rows, new_df = df. drop_duplicates (subset = 'member', keep = 'last') # export result out = pd.ExcelWriter('output.xls ') new_df.to_excel (out) out. save ()

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.