EASY400 EZ4PARSE- XML Parser for IBM i

ez4PARSE

iSeries home |

Easy400

INTRODUCTION
Public-Source

About it
Prerequisites
Installation
Updates
Command XMLPARSE
How it works
Restrictions
Error codes

MORE COMMANDS(!) ...

Download

	ez4Parse easy400 parser
	Public Source XML Parser

	by Giovanni B. Perotti (Italy)

1-About it

This open source utility for IBM System i features a command - XMLPARSE - that loads the data from a XML stream file to a database file member.

2-Prerequisites

At least OS/400 release V7R1.
Compiler ILE RPG IV, product 57xxWDS, opt. 31, is required during the installation phase and to run command XMLPARSE.
Minimum knowledge of XML language.

3-Installation

Download file ez4parse.zip from the Easy400 download page and unzip it.
Follow the ez4parse.txt instructions to upload and to restore library EZ4PARSE.
On the IBM System i
- sign on with a class *SECOFR user profile
- run the following procedure:
  STRREXPRC SRCMBR(INSTALL) SRCFILE(EZ4PARSE/QREXSRC)
  It does the following:
  - creates objects in library EZ4PARSE
  - restores IFS directory /ez4parse

4-Updates

To know about the latest updates to this tool, press this link.
To know about the release date of the Ez4PARSE version you may have installed, on your IBM i system execute command EZ4PARSE/RELEASED .

5-Command XMLPARSE

This is the core XML conversion command of this utility.
This basic command requires that the names of the XML elements match the field names of the receiving database file.
In the next page you can find other commands to solve more difficult cases:

Command CRTTODBF creates a receiving data dabatase file with field names the same as the XML element names
Command RNMXMLFLDS duplicates the XML file while renaming its XML element names
Command SUPERPARSE automatically drives you through commands of this utility.

                       Parse a XML stream file (XMLPARSE)
 
 Type choices, press Enter.

 XML stream file  . . . . . . . . STMF       >                                             
                                                                                
 Path to the XML elements . . . . XMLPTH       *AUTO                                       
                                                                                           
                                                                                           
                                                                                           
 Number of segments in XMLPTH . . XMLPTHNBR    2             1-99
 Target database file . . . . . . TOFILE                     Name
   library  . . . . . . . . . . .                *CURLIB     Name, *CURLIB
 Target member  . . . . . . . . . TOMBR        *FILE         Name, *FILE
 Replace or add records . . . . . MBROPT       *REPLACE      *ADD, *REPLACE
 Take default parsing options . . DFTOPTIONS > *NO           *YES, *NO
  1.CCSID parsing option  . . . . CCSID        *BEST         *BEST, *JOB, *UCS2
  2.TRIM parsing option . . . . . TRIM         *ALL          *ALL, *NONE
  3.ALLOWMISSING parsing option   ALWMISSING   *YES          *YES, *NO
  4.ALLOWEXTRA parsing option . . ALWEXTRA     *YES          *YES, *NO
                                                                                    Bottom

Figure 1 - Command XMLPARSE

This command gets data from a XML stream file and adds or replace records to a given database file member.

Note: Before running this command, you must make sure that the data elements in the XML files match the field names in the TOFILE database record format.

Command parameters:

XML stream file (STMF):
Path and name of the XML stream file to be parsed.
Path to the XML elements (XMLPTH):
- This parameter is used to build the so called "path option" used by the the XML parser program.
  The "path option" specifies how to locate the desired XML elements within the XMLdocument.
  The path option specifies the path to the element as it appears in the XML document, with elements separated by forward slashes.
  EXAMPLE - In the case where the XML stream file is
  <?xml version="1.0" encoding="iso-8859-1" ?>
  <RowSet>
  <Row number ="1"><SEQ>1</SEQ><DWNS>6049</DWNS><ITEM>cgidev2.zip</ITEM>
  <Row number ="2"><SEQ>2</SEQ><DWNS>886</DWNS><ITEM>cbldev2.zip</ITEM>
  </RowSet>
  the value of the "path option" should be "Rowset/Row". This would allow the parser program to understand that the names of the XML data elements (that should match the field names in the database file record format) are:
  SEQ, DWNS, and ITEM.
  Conclusion: in such a case, one should specify XMLPTH('Rowset/Row') .
  It must be stressed that this parameter is case sensitive.
- A quicker way to specify the "path" option is that to take the default option XMLPTH(*AUTO).
  If you do so, you must also specify the number of path segments in the next parameter XMLPTHNBR (its default value is 2).
  By doing this, the parser program will examine the XML stream file and make up the "path option" value by itself.
Number of segments in XMLPTH (XMLPTHNBR):
Number of segments for the "path option" when XMLPTH(*AUTO) is specified.
EXAMPLE - In the case where the XML stream file is
<?xml version="1.0" encoding="iso-8859-1" ?>
<RowSet>
<Row number ="1"><SEQ>1</SEQ><DWNS>6049</DWNS><ITEM>cgidev2.zip</ITEM>
<Row number ="2"><SEQ>2</SEQ><DWNS>886</DWNS><ITEM>cbldev2.zip</ITEM>
</RowSet>
- if you specify XMLPTH(*AUTO) XMLPTHNBR(2), the "path option" is computed as 'rowset/row', which is correct.
- if you instead were to specify XMLPTH(*AUTO) XMLPTHNBR(3), the "path option" would be computed as 'rowset/row/seq', which is not correct.
Conclusion: in most cases XMLPTH(*AUTO) XMLPTHNBR(2) would do the work.
Target database file (TOFILE):
Specifies the qualified name of the database file that receives the records parsed from the XML file.
Note: the field names in the database file record must match the names of the data elements in the XML file.
Target member (TOMBR):
Specifies the name of the database file member that receives the records parsed from the XML file.
You may specify TOMBR(*FILE) if the memberr name is the same the file name.
Replace or add records (MBROPT):
Specifies whether the new records replace or are added to the existing records.
Take default parsing options (DFTOPTIONS):
Whether the default values should be taken for the XML parsing options.
Select:
- *YES to take the default values for the parsing options. This choice works fine in most cases, when the XML structure perfectly matches the database record layout.
- *NO to specify different parsing options. In this way you are listed the following parsing options, so that you can make up your choices:
  - CCSID parsing option (CCSID):
    Specifies the CCSID to be used for processing the XML stream file.
    - *BEST indicates that the document should be processed in the CCSID that will best preserve the data in the document. If the document is in the job CCSID or an ASCII CCSID related to the job CCSID, the document will be processed in the job CCSID. Otherwise, the document will be processed in UCS-2 and the data will be converted to the job CCSID before it is assigned to variables with a data type other than UCS-2.
    - *JOB indicates that the document should be processed in the job CCSID. The data will be converted to UCS-2 when it is assigned to UCS-2 variables.
    - *UCS2 indicates that the document should be processed in UCS-2. The data will be converted to the job CCSID when it is assigned to variables with a data type other than UCS-2.
  - TRIM parsing option (TRIM):
    Specifies whether whitespace (blanks, newlines, tabs etc.) should be trimmed from text data before the data is assigned to RPG variables (record fields).
    - *ALL indicates that before text content is assigned to the RPG character or UCS-2 variable, the following steps will be done:
      1. Leading and trailing whitespace will be trimmed completely from text content
      2. Strings of interior whitespace in the text content will be reduced to a single blank.
    - *NONE indicates that no whitespace will be trimmed from text content. This option will have the best performance, but it should only be used if the whitespace is wanted, or if the XML data is known to contain no unwanted whitespace.
  - ALLOWMISSING parsing option (ALWMISSING):
    - For the situation where the XML stream file does not have sufficient XML elements for the fields of a database record format, you can use ALWMISSING(*YES) to indicate this not to be considered an error.
    - If expected XML data is not found, and ALWMISSING(*NO) is specified, the operation will fail with status 00353 (XML does not match RPG variable).
  - ALLOWEXTRA parsing option (ALWEXTRA):
    - For the situation where the XML document has XML elements that do not match any fields in the database record format, you can use ALWEXTRA(*YES) to indicate this not to be considered an error.
    - If unexpected XML data is found, and ALWEXTRA(*NO) is specified, the operation will fail with status 00353 (XML does not match RPG variable).

For more information and examples about "XML parsing options", see this PDF.

6-How it works

The following describes how program XMLPARSE does its job:

Calls program CRTPARSEP to create in library QTEMP a XML parser program accessing the specified database file member.
Program CRTPARSEP does the following:
1. Calls system API QUSLRCD to retrieve the name of the target database file record format
2. Copies to a QTEMP source file the source of an ILE-RPG template XML parser program, by customizing its output file name and record format name
3. Compiles this QTEMP customized XML parser program
Calls the QTEMP XML customized parser program providing all the parameters received from command XMLPARSE.

Note. The QTEMP ILE-RPG XML parser program takes advantage of the special statement XML-INTO available since V6R1.

7-Restrictions

Command XMLPARSE does not support XML repeated data fields and data subfields.
For instance, in the following example:

<?xml version="1.0" encoding="UTF-8" ?>
<CrossList>
 <product>
  <markerCode>SKF</markerCode>
  <mpiReference>SKF VKJP 2053</mpiReference>
  <tecDocReference>VKJP 2053</tecDocReference>
  <description>JEU DE SOUFFLETS DE DIRECTION  OPEL, VAUXHALL</description>
  <grossPrice>16.00</grossPrice>
  <discount>50.00</discount>
  <netPrice>8.00</netPrice>
  <deposit>.00</deposit>
  <packageAmount>1</packageAmount>
  <stock>
   <country>BE</country>
   <deliveryDate/>
   <parts>0</parts>
  </stock>
  <stock>
   <country>NL</country>
   <deliveryDate/>
   <parts>0</parts>
  </stock>
  <stock>
   <country>FR</country>
   <deliveryDate/>
   <parts>0</parts>
  </stock>
 </product>
</CrossList>

Figure 2- XML with repeated fields and data subfields

data field <stock> is a repeated data field (it appears 3 times in "record" structure <product>)
data field <stock> is made of 3 data subfields (<country>, <deliveryDate> and <parts>)

When such cases occur, though XMLPARSE may not bump out, the data transfer from those XML data fields does not correctly work.

8-Error codes returned from XML-INTO

Error code	Message text comments
00351	Error in XML parsing See "ML parser error message RNX0351, detailed error meanings", below.
00352	Invalid XML option
00353	XML document does not match RPG variable XML data element names do not match database record field names. Checks: Display XML stream file and database record field names, look for mismatches ALWMISSING(YES) or ALWEXTRA(YES) may help
00354	Error preparing for XML parsing

• XML parser error message RNX0351, detailed error meanings:
If the XML parser detects an error in the XML document during parsing, message RNX0351 will be issued.
From the message, you can get the specific error code associated with the error, as well as the offset in the document where the error was discovered.
The following table shows the meaning of each parser error code:

XML Parser Error Code	Description
1	The parser found an invalid character while scanning white space outside element content.
2	The parser found an invalid start of a processing instruction, element, comment, or document type declaration outside element content.
3	The parser found a duplicate attribute name.
4	The parser found the markup character '<' in an attribute value.
5	The start and end tag names of an element did not match.
6	The parser found an invalid character in element content.
7	The parser found an invalid start of an element, comment, processing instruction, or CDATA section in element content.
8	- The parser found in element content the CDATA closing character sequence ']]>' without the matching opening character sequence '<![CDATA['. - A data element failed to be converted to a numeric value.
9	The parser found an invalid character in a comment.
10	The parser found in a comment the character sequence '--' (two hyphens) not followed by '>'.
11	The parser found an invalid character in a processing instruction data segment.
12	A processing instruction target name was 'xml' in lowercase, uppercase or mixed case.
13	The parser found an invalid digit in a hexadecimal character reference (of the form &#xdddd;, for example &#x0eb1).
14	The parser found an invalid digit in a decimal character reference (of the form &#dddd;).
15	A character reference did not refer to a legal XML character.
16	The parser found an invalid character in an entity reference name.
17	The parser found an invalid character in an attribute value.
18	The parser found a possible invalid start of a document type declaration.
19	The parser found a second document type declaration.
20	An element name was not specified correctly. The first character was not a letter, '_', or ':', or the parser found an invalid character either in or following the element name.
21	An attribute was not specified correctly. The first character of the attribute name was not a letter, '_', or ':', or a character other than '=' was found following the attribute name, or one of the delimiters of the value was not correct, or an invalid character was found in or following the name.
22	An empty element tag was not terminated by a '>' following the '/'.
23	The element end tag was not specified correctly. The first character was not a letter, '_', or ':', or the tag was not terminated by '>'.
24	The parser found an invalid start of a comment or CDATA section in element content.
25	A processing instruction target name was not specified correctly. The first character of the processing instruction target name was not a letter, '_', or ':', or the parser found an invalid character in or following the processing instruction target name.
26	A processing instruction was not terminated by the closing character sequence '?>'.
27	The parser found an invalid character following '&' in a character reference or entity reference.
28	The version information was not present in the XML declaration.
29	The 'version' in the XML declaration was not specified correctly. 'version' was not followed by '=', or the value was missing or improperly delimited, or the value specified a bad character, or the start and end delimiters did not match, or the parser found an invalid character following the version information value closing delimiter in the XML declaration.
30	The parser found an invalid attribute instead of the optional encoding declaration in the XML declaration.
31	The encoding declaration value in the XML declaration was missing or incorrect. The value did not begin with lowercase or uppercase A through Z, or 'encoding' was not followed by '=', or the value was missing or improperly delimited or it specified a bad character, or the start and end delimiters did not match, or the parser found an invalid character following the closing delimiter.
32	The parser found an invalid attribute instead of the optional standalone declaration in the XML declaration.
33	The 'standalone' attribute in the XML declaration was not specified correctly. 'standalone' was not followed by a '=', or the value was either missing or improperly delimited, or the value was neither 'yes' nor 'no', or the value specified a bad character, or the start and end delimiters did not match, or the parser found an invalid character following the closing delimiter.
34	The XML declaration was not terminated by the proper character sequence '?>', or contained an invalid attribute.
35	The parser found the start of a document type declaration after the end of the root element.
36	The parser found the start of an element after the end of the root element.
300	The parser reached the end of the document before the document was complete.
301	The %HANDLER procedure for XML-INTO or XML-SAX returned a non-zero value, causing the XML parsing to end.
302	The parser does not support the requested CCSID value or the first character of the XML document was not '<'.
303	The document was too large for the parser to handle. The parser attempted to parse the incomplete document, but the data at the end of the document was necessary for the parsing to complete.
500-999	Internal error in the external parser. Please report the error to your service representative.
10001-19999	Internal error in the parser. Please report the error to your service representative.