New XLPARSER4 Tool Supports XLSX Format
by Scott Klement - March 11th, 2008
(Index of Scott's articles on Excel, RPG and Java)
 
In the February 25, 2010, issue of System iNetwork Programming Tips, I presented version 3.6 of Apache's POI utility and explained how I had updated my HSSFR4 service program to work with it so that you can create, load, or update both of Excel's native formats, XLS and XLSX. In today's article, I also update XLPARSER4 to use POI 3.6 with XLSX support.

The XLPARSER4 tool is based on the EventModel from the POI project, whereas HSSFR4 is based on the UserModel. What this means is that XLPARSER4 can extract all the values from cells very quickly and with a low memory footprint, compared to HSSFR4. This ability is perfect when data from a spreadsheet is to be loaded into a physical file-or any other task in which the spreadsheet is only read and not written. By contrast, HSSFR4 is slower and requires more memory, but it offers the ability to write the data back to the original spreadsheet.

Requirements

The system requirements for running XLPARSER4 are the same as those of HSSFR4, namely:

  • IBM i release 5.4 or higher ILE RPG Compiler (licpgm 57xx-WDS, option 31)
  • IBM Developer Kit for Java (licpgm 57xx-JV1, *base)
  • Java 5.0 or newer (at least one of the following licpgms 57xx-JV1, opt 7, 8, 9, 10, 11 or 12)
  • System Openness Includes (licpgm 57xx-SS1, opt 13)
  • QShell (57xx-SS1, opt 30) not required but highly recommended
  • Apache POI version 3.5 or higher (3.6 recommended)
  • The XLPARSER4 RPG code (link is at the end of this article)

With those options installed, you'll be able to read the "Binary Excel" (.XLS) format of Excel document that was available in previous releases of POI. To take things a step further and use the OOXML support (.XLSX), you'll need the following additional Java JAR files available:

  • dom4j-1.6.1.jar from DOM4J (version 1.6.1 or higher)
  • jsr173_1.0_api.jar from XmlBeans 2.5.0 (or higher)
  • xbean.jar from XmlBeans 2.5.0 (or higher)
These tools are available at no charge from Apache.

Obtaining the JAR Files

The POI code is open-source (no charge) software. The extra JAR files that it depends on are also open-source packages.

Many of these projects provide things in both a "Binary" and "Source" distribution. Binary means that it does not contain the source code, rather it contains precompiled Java code, ready for you to use. Source means that you get the source code, and it's up to you to compile it. I recommend the "Binary" option unless you are comfortable with compiling Java projects.

Typically, the downloads will be ZIP files from the respective project's home page. You'll want to download the ZIP file from the appropriate site and then extract the ZIP file to get the JAR file(s) from it.

You should then upload the JAR file to the appropriate directory in your IFS.

At the time that I'm writing this (March 2010), these are the links for the downloads of the various JAR files. However, as time goes by, I'm sure the active version numbers will change, as will the exact links for downloading them. Here are links to the respective project's home pages, just in case:

From the downloads for these projects, you'll need to unzip the main files and get the appropriate JAR files to put into your IFS. Here are the JAR files you'll need, and the location I suggest using in the IFS:
      Required for  
Tool JAR file IFS Directory HSSF XSSF Description
POI 3.6 poi-3.6-20091214.jar /java/poi3.6 X X Main POI code
POI 3.6 poi-ooxml-3.6-20091214.jar /java/poi3.6   X Additional POI code for XSSF
POI 3.6 poi-ooxml-schemas-3.6-20091214.jar /java/poi3.6   X Schemas for XSSF code
DOM4J 1.6.1 dom4j-1.6.1.jar /java/dom4j   X XML Parser (DOM, SAX, JAXP) for Java
XmlBeans 2.5.0 xbean.jar /java/xmlbeans   X XMLBeans Utility
XmlBeans 2.5.0 jsr173_1.0_api.jar /java/xmlbeans   X API needed for XMLBeans to parse XML
XlParse 3.6 xlparse.jar /java/xlparse3.6 X X The Java portion of Scott's XLPARSE utility
(Included in the code download for this article.)

Note: Many of the JAR file names above contain version numbers. If you download a different version of the tool, please be aware that the name of the JAR file will differ slightly.

The XmlBeans project contains additional JAR files that I did not list above. These files are not required by POI at this time.

Setting Up Your CLASSPATH

To use XLPARSER4 you'll need the preceding JAR files. You may be wondering, what are JAR files, and how will your RPG program use them?

JAR stands for Java Archive. It's a way of distributing Java objects so that they can be attached to people's programs and used. Java code can call other Java code, of course-and ILE RPG can also call Java code directly. This handy combination means that someone (who may know nothing about RPG or IBM i) can write some nice utility code, and you can grab that code and use it from your RPG programs. Pretty neat, isn't it?

When Java code is compiled, it's compiled to disk objects called "class files." In order to run this Java code, a list of locations to search for class files must be put into a variable named CLASSPATH. Each entry in a CLASSPATH can be one of two things:

  1. an IFS folder containing one or more Java class files
  2. a JAR file (which also will contain one or more Java class files)

A class path never contains the names of the class files themselves; it always contains the names of "containers" that can have many class files within them. This is similar to the way a library list works in the traditional environment. A library list is a list of libraries (but not the objects within them) that is searched for objects. A class path contains a list of IFS directories or JAR files that contain class files.

For the purposes of this article, everything in the class path is a JAR file.

Here's an example command in a CL program that sets the CLASSPATH. I prefer to run this when I sign on to my IBM i system:

 ADDENVVAR ENVVAR(CLASSPATH) +
           VALUE('/java/xlparse3.6/xlparse.jar+
                :/java/poi3.6/poi-3.6-20091214.jar+
                :/java/poi3.6/poi-ooxml-3.6-20091214.jar+
                :/java/poi3.6/poi-ooxml-schemas-3.6-20091214.jar+
                :/java/dom4j/dom4j-1.6.1.jar+
                :/java/xmlbeans/jsr173_1.0_api.jar+
                :/java/xmlbeans/xbean.jar') +
           LEVEL(*JOB) +
           REPLACE(*YES)

Each JAR file (or directory) in the CLASSPATH is separated from the others by a colon. In the preceding example, you'll notice that all the JAR files required by this tool have been listed in the CLASSPATH. When I run an RPG program that needs to parse Excel, it'll know where to find the appropriate Java code because it's in my CLASSPATH.

Note: The name CLASSPATH must be in all uppercase letters, and this variable must be set before any program in the job uses Java. This variable is only read the first time Java is used within a given job.

Environments

Oftentimes, you'll want to test programs in different environments. Maybe you'll want to have one environment using an older version of POI, and another using a new version of POI. For a discussion of how to set up separate environments for each version, please see the article about HSSF ("New POI Release Brings Excel's XLSX Support to RPG"). The concepts are the same for XLPARSER4.

Updating the XLPARSER4 RPG Code

The RPG code needed to parse Excel spreadsheets, as well as demonstrations of its use, are included in the code download at the end of this article. Please read the README file included with that code for instructions relating to how that code should be installed on your IBM i system.

Trying It Out

Once you have the appropriate code installed, it's finally time to try parsing an Excel spreadsheet from RPG. The basic premise is that the XLPARSER4 tool will open up an Excel file, and each tab (or "sheet") from top to bottom, left to right.

This process has not changed from previous versions of XLPARSER4. The only difference between this version and previous versions is that XLPARSER4 can handle either an XLSX file or an XLS file. (Previous versions only handled XLS.)

For the sake of example, let's say you have a spreadsheet that looks like this:
The whole Excel document is referred to as a "workbook." Each tab in the workbook is referred to as a "sheet." XLPARSER4 will start with the first sheet, titled "Sales" and will then read all the cells in that sheet. It will start with row 1 and will read any columns that contain data. (In this case, only column A in row 1 contains data, and that's the title "Acme Widgets, Inc Quarterly Sales."

XLPARSER4 will then proceed to read rows 2, 3, 4, 5, etc. For each column it finds in any cell that contains data, it will determine whether that cell is a character cell (such as the string "Part number") or a numeric cell (such as the quarterly sales figures). Anytime it finds a cell containing data, it will call a subprocedure of your choosing and will pass parameters to that procedure. The parameters will contain the sheet name, column number, row number, and cell value. The column and row numbers will be numbered from 0, so col=0, row=0 is Excel's cell A1. col=1, row=14 is Excel's cell B15, and so forth.

Your subprocedure is responsible for storing the data in an array or a file or a data structure or whatever is appropriate for the work that you are doing. Your subprocedure will be called repeatedly by XLPARSER4, once for each cell that contains data in the spreadsheet.

For a demonstration of this process, please see the XLPDEMO program provided in the code download, and take a look at the previous articles I've written about XLPARSER4, listed here:

Note: Even though the concepts explained in the preceding articles apply to the current version of XLPARSER4, I strongly recommend that you download the latest copy of the code from this article instead of the previous ones. Today's code download should have all the sample programs demonstrated in the earlier articles.

Known problems

Currently, I am having trouble getting formula support to work properly with XLSX files in XLPARSER4. Once I discover the problem, this will be resolved in a future release.

Code download

You can download the updated version of XLPARSER4 from the following link:
scottklement.com/poi



Related links