New Functions in XLPARSER4
by Scott Klement - November 25th, 2008
(Index of Scott's articles on Excel, RPG and Java)
 

Q: I've been trying your XLPARSER4 utility to read Excel spreadsheets, but I have a problem. My subprocedures don't get called when a cell is empty. Since some empty cells occur at the end of a row, how can I tell when a row is complete?

A: XLPARSER4 skips empty cells because they don't exist in the Excel file. Excel keeps the size of the XLS file to a minimum by not storing empty cells on disk. You're right that this behavior makes it challenging to keep track of when a new row begins or ends, because it's possible that the last cell in a row will be empty and therefore you can't use the last cell in the row to indicate that the entire row has been read.

I've had that problem myself, so I decided to add some new functions to XLPARSER4 to help you detect when an entire row is complete.

If you've been using my XLPARSER4 utility to read a spreadsheet, you know that you provide it with three subprocedures, one to be called when it finds a cell containing character data, one to be called when it finds a cell containing numeric data, and one to be called when it finds a cell with a formula in it. If you haven't used the tool previously or would like to refresh your memory, please read the most recent article about XLPARSER4.

In this new update to XLPARSER4, you can provide two additional subprocedures to XLPARSER4. The XLPARSER4 service program calls these routines when a new row begins or ends, respectively.

For example, your program might start like this:

      xlparse_notify(  %paddr(clear_struct)
                    :  %paddr(print_struct) );

      xlparse_workbook( '/tmp/november_sales.xls'
                      : %paddr(Numeric) 
                      : %paddr(Character)
                      : %paddr(Formula) );

The xlparse_notify() subprocedure is the one I added for this article. It accepts two parameters: a subprocedure to call before each row begins, and a subprocedure to call when each row is complete.

In the above example, the clear_struct() subprocedure is called when a row begins. That procedure clears the contents of a data structure that will eventually contain one row-full of data from the spreadsheet. Because this data structure will be cleared before every row, any empty cells will be empty in the data structure. Clear_struct() will be called only for rows that have data in the spreadsheet.

The print_struct() subprocedure will be called when all the cells have been loaded for a particular row. It will be called only for rows that have data in the spreadsheet. In this example, print_struct() will be used to print the contents of the data structure that contains one row.

The Numeric, Character, and Formula subprocedures will be called when cells that contain Numeric, Character, and Formula data, respectively, are found. They will be called repeatedly in a loop for every cell found in the spreadsheet file. In the sample program, they're used to load data into the data structure that will eventually be printed by the print_struct() routine.

You can download the updated copy of XLPARSER4 as well as the sample program described in this article and all the sample programs from the previous articles from the following link:
http://www.pentontech.com/IBMContent/Documents/article/57503_795_XlParse.zip