Reprinted: Http://www.ibm.com/developerworks/cn/linux/sdk/perl/culture-8/Spreadsheet::WriteExcel and Spreadsheet:: Parseexcel
In 2000, Takanori Kawai and John McNamara wrote theSpreadsheet::WriteExcelmodules andSpreadsheet::ParseExcelposted them on CPAN, which made it possible (albeit not easy) to extract data from an Excel file on any platform. two modules.
As we'll see later, if you're using Windows, there'sWin32::OLEstill a simpler and more reliable solution, and theSpreadsheet::WriteExcelmodule recommends using it forWin32::OLEmore powerful data and worksheet manipulation.Win32::OLEwith the ActiveState Perl Toolkit, you can use OLE to drive many other Windows applications. Note that to use this module, you still need to install and register an Excel engine on the machine (usually installed with Excel itself).
There are thousands of applications that need to parse Excel data, but here are a few examples: export Excel to CSV, interact with spreadsheets stored on shared drives, move financial data to a database to form reports, and analyze data without providing any other format.
To demonstrate the examples given here, you must install Perl 5.6.0 on your system. Your system is best for the most recent (2000 or later) mainstream UNIX installation (Linux, Solaris, and BSD). Although these examples are available in previous versions of Perl and Unxi and other operating systems, you should take into account that you will be faced with situations where they do not work as exercises.
Windows Sample: Parsing
This section applies only to Windows machines. All other sections apply to Linux.
Before you proceed, install ActiveState perl (use version 628 here) or the ActiveState Komodo IDE to Edit and debug Perl. Komodo provides a free license for home users, and you can get it in about a few minutes. (for download sites, see resources later in this article.) )
ItSpreadsheet::ParseExcelis difficult to install and module using the ActiveState PPM Package ManagerSpreadsheet::WriteExcel. PPM has no history, it is difficult to set options, help will roll out of the screen and default is to ignore dependencies and install. You can enter "ppm" from the command line and issue the following command to invoke PPM:
Listing 1: Installing the Excel module's PPM command
ppm> install OLE::Storage_Lite
ppm> install Spreadsheet::ParseExcel
ppm> install Spreadsheet::WriteExcel
In this case, the installation of the module will fail becauseIO::Scalarit is not yet available, so you may want to discard the PPM problem lookup and turn to the built-inWin32::OLEmodule. However, when you read this article, ActiveState may have released a fix for the issue.
With ActiveStateWin32::OLE, you can use the code listed below to dump a worksheet unit by cell:
List 2:win32excel.pl
#!/usr/bin/perl -w
use strict;
use Win32::OLE qw(in with);
use Win32::OLE::Const ‘Microsoft Excel‘;
$Win32::OLE::Warn = 3; # die on errors...
# get already active Excel application or open new
my $Excel = Win32::OLE->GetActiveObject(‘Excel.Application‘)
|| Win32::OLE->new(‘Excel.Application‘, ‘Quit‘);
# open Excel file
my $Book = $Excel->Workbooks->Open("c:/komodo projects/test.xls");
# You can dynamically obtain the number of worksheets, rows, and columns
# through the Excel OLE interface. Excel‘s Visual Basic Editor has more
# information on the Excel OLE interface. Here we just use the first
# worksheet, rows 1 through 4 and columns 1 through 3.
# select worksheet number 1 (you can also select a worksheet by name)
my $Sheet = $Book->Worksheets(1);
foreach my $row (1..4)
{
foreach my $col (1..3)
{
# skip empty cells
next unless defined $Sheet->Cells($row,$col)->{‘Value‘};
# print out the contents of a cell
printf "At ($row, $col) the value is %s and the formula is %s\n",
$Sheet->Cells($row,$col)->{‘Value‘},
$Sheet->Cells($row,$col)->{‘Formula‘};
}
}
# clean up after ourselves
$Book->Close;
Note that you can easily assign values to cells in the following ways:
$sheet->cells ($row, $col)->{' Value '} = 1;
Back to top of page
Linux Example: parsing
This section applies to UNIX, especially for Linux. It is not tested in Windows.
It's hard to give aSpreadsheet::ParseExcelbetter example of Linux parsing than the example provided in the module documentation, so I'll show that example and explain how it works.
Download parse-excel.pl
List 3:parse-excel.pl
#!/usr/bin/perl -w
use strict;
use Spreadsheet::ParseExcel;
my $oExcel = new Spreadsheet::ParseExcel;
die "You must provide a filename to $0 to be parsed as an Excel file" unless @ARGV;
my $oBook = $oExcel->Parse($ARGV[0]);
my($iR, $iC, $oWkS, $oWkC);
print "FILE :", $oBook->{File} , "\n";
print "COUNT :", $oBook->{SheetCount} , "\n";
print "AUTHOR:", $oBook->{Author} , "\n"
if defined $oBook->{Author};
for(my $iSheet=0; $iSheet < $oBook->{SheetCount} ; $iSheet++)
{
$oWkS = $oBook->{Worksheet}[$iSheet];
print "--------- SHEET:", $oWkS->{Name}, "\n";
for(my $iR = $oWkS->{MinRow} ;
defined $oWkS->{MaxRow} && $iR <= $oWkS->{MaxRow} ;
$iR++)
{
for(my $iC = $oWkS->{MinCol} ;
defined $oWkS->{MaxCol} && $iC <= $oWkS->{MaxCol} ;
$iC++)
{
$oWkC = $oWkS->{Cells}[$iR][$iC];
print "( $iR , $iC ) =>", $oWkC->Value, "\n" if($oWkC);
}
}
}
This example is tested in Excel 97. If it doesn't work, try converting it to Excel 97 format.Spreadsheet::ParseExcelThe Perldoc page also claims Excel 95 and 2000 compatibility.
The spreadsheet is parsed into a top-level object named $oBook. $oBook features that have auxiliary programs, such as "File", "Sheetcount", and "Author".Spreadsheet::ParseExcelThese features are documented in the workbook section of the Perldoc page.
The workbook contains several worksheets: iterate over them by using the workbook Sheetcount attribute. Each worksheet has a minrow and mincol along with the corresponding MaxRow and Maxcol attributes that can be used to determine the scope that the workbook can access.Spreadsheet::ParseExcelThese features are documented in the worksheet section of the Perldoc page.
Cells can be obtained from the worksheet through the cell attribute, which is how the $oWkC object is obtained in Listing 3.Spreadsheet::ParseExcelThe cell attribute is documented in the cell section of the Perldoc page. According to the documentation, there seems to be no way to get the formulas listed in a particular cell.
Back to top of page
Linux Example: Write
This section applies to UNIX, especially for Linux. It is not tested in Windows.
Spreadsheet::WriteExcelThere are many sample scripts in the Examples directory, and these scripts can often be found under/usr/lib/perl5/site_perl/5.6.0/spreadsheet/writeexcel/examples. It may be installed elsewhere, and if that directory is not found, contact your local Perl administrator.
The bad news is thatSpreadsheet::WriteExcelyou cannot write to an existing Excel file. You must use your ownSpreadsheet::ParseExcelimport data from an existing Excel file. The good news is that it isSpreadsheet::WriteExcelcompatible with Excel 5 until Excel 2000.
Here is a program that shows how to extract, modify (all numbers multiplied by 2) data from an Excel file, and write data to a new Excel file. Only data is preserved, formatting and any attributes are not preserved. The formula is discarded.
Download excel-x2.pl
List 4:excel-x2.pl
#!/usr/bin/perl -w
use strict;
use Spreadsheet::ParseExcel;
use Spreadsheet::WriteExcel;
use Data::Dumper;
# cobbled together from examples for the Spreadsheet::ParseExcel and
# Spreadsheet::WriteExcel modules
my $sourcename = shift @ARGV;
my $destname = shift @ARGV or
die "invocation: $0 <source file> <destination file>";
my $source_excel = new Spreadsheet::ParseExcel;
my $source_book = $source_excel->Parse($sourcename)
or die "Could not open source Excel file $sourcename: $!";
my $storage_book;
foreach my $source_sheet_number (0 .. $source_book->{SheetCount}-1)
{
my $source_sheet = $source_book->{Worksheet}[$source_sheet_number];
print "--------- SHEET:", $source_sheet->{Name}, "\n";
# sanity checking on the source file: rows and columns should be sensible
next unless defined $source_sheet->{MaxRow};
next unless $source_sheet->{MinRow} <= $source_sheet->{MaxRow};
next unless defined $source_sheet->{MaxCol};
next unless $source_sheet->{MinCol} <= $source_sheet->{MaxCol};
foreach my $row_index ($source_sheet->{MinRow} ..
$source_sheet->{MaxRow})
{
foreach my $col_index ($source_sheet->{MinCol} ..
$source_sheet->{MaxCol})
{
my $source_cell = $source_sheet->{Cells}[$row_index][$col_index];
if ($source_cell)
{
print "( $row_index , $col_index ) =>", $source_cell->Value, "\n";
if ($source_cell->{Type} eq ‘Numeric‘)
{
$storage_book->{$source_sheet->{Name}}->{$row_index}-
>{$col_index} = $source_cell->Value*2;
}
else
{
$storage_book->{$source_sheet->{Name}}->{$row_index}-
>{$col_index} = $source_cell->Value;
} # end of if/else
} # end of source_cell check
} # foreach col_index
} # foreach row_index
} # foreach source_sheet_number
print "Perl recognized the following data (sheet/row/column order):\n";
print Dumper $storage_book;
my $dest_book = Spreadsheet::WriteExcel->new("$destname")
or die "Could not create a new Excel file in $destname: $!";
print "\n\nSaving recognized data in $destname...";
foreach my $sheet (keys %$storage_book)
{
my $dest_sheet = $dest_book->addworksheet($sheet);
foreach my $row (keys %{$storage_book->{$sheet}})
{
foreach my $col (keys %{$storage_book->{$sheet}->{$row}})
{
$dest_sheet->write($row, $col, $storage_book->{$sheet}->{$row}->{$col});
} # foreach column
} # foreach row
} # foreach sheet
$dest_book->close();
print "done!\n";
It is important to note that the data extraction and storage parts of the program must be separated. They could have been done at the same time, but by separating them, it was easy to bug fix and improve.
A much better solution for the above problem may be implemented through theXML::ExcelCPAN module, but you must write a special converter to convert the XML back to Excel. If you want to import data in that way, you can alsoDBD::Exceluse the DBI interface through the module. Finally,Spreadsheet::ParseExcelwithSpreadsheet::ParseExcel::SaveParsera module, it claims to be able to convert between two Excel files, but there are no documents and examples. My site (see Resources) shows anSaveParserexample of use. Warning: It's an experimental procedure and it's very easy to go wrong.
Back to top of page
Conclusion
If you're using a Windows machine, stick withWin32::OLEthe module unless you have Excel at all on your machine. WhileSpreadsheet::WriteExcelSpreadsheet::ParseExcelthe functionality of the module is constantly being perfected, it is the easiestWin32::OLEway to get Excel data today.
In UNIX, especially Linux, useSpreadsheet::WriteExcelandSpreadsheet::ParseExcelmodules to programmatically access Excel data. But warning: they are quite immature modules, and if you need stability, they may not be right for you.
You can also consider packages such as Gnumeric and StarOffice (see Resources), which are available for free, and they provide a complete GUI interface and the ability to import/export Excel files. They are useful if you do not need programmatic access to Excel data. I've used both of these applications and I've found them to be good for everyday work.
Read and write Excel files in Perl