Web Serving Performance

In this page we have copied Chapter 6 of SC41-0607-02 AS/400 Performance Capabilities Reference - Version 4, Release 4.
We believe that reading this chapter, expecially the section Web Server Performance Tips and Techniques, will contribute to a better understanding of the factors that may influence system capacity in a Web serving environment.

Chapter 6. Web Serving Performance

Performance information for Web serving on the AS/400 and various types of web server transactions will be discussed in this section.

There are many factors that can impact overall performance (e.g., end-user response time, throughput) in the complex Internet environment, some of which are listed below:

Web browser
- processing speed of the client system
- performance characteristics of the web browser
- client application performance characteristics
Communication network
- speed of the communication links
- capacity of any proxy servers
- congestion of network resources
AS/400 Web server
- AS/400 processor speed
- utilization of key AS/400 resources (CPU, IOPs, memory, disk)
- Web server performance characteristics
- application (e.g. servlet) performance characteristics

The primary focus of this section will be to discuss the performance characteristics of the AS/400 as a server in a Web serving environment, providing capacity planning information and recommendations for best performance. Please refer to "Chapter 5. Communications performance" for related information.

Data accesses across the Internet differ distinctly from accesses across 'traditional' communications networks. The additional resources to support Internet transactions by the CPU, IOPs, and line are significant and must be considered in capacity planning. Typically, in a traditional network:

there is a request and response (between client and server)
connections/sessions are maintained between transactions
networks are tuned to use large frames

For web transactions there are a dozen or more line transmissions (including acknowledgements) per transaction:

a connection is established/closed for each transaction
there is a request and response (between client and server)
networks typically have small frame (MTU) sizes
one user transaction may contain separate internet transactions
secure transactions are frequent and consume more resource

The information that follows is based on performance measurements and analysis don in the internal IBM performance lab. The raw data is not provided here, but the highlights, general conclusions, and recommandations are included. Results listed here do not represent any particular customer environment. Actual performance may vary significantly from what is provided here. Note that these workloads, along with other published benchmark data (from other sources) are always measured in best-case environment (e.g., local LAN, large MTU sizes). Real Internet networks tipycally have higher contention, MTU sizes limitations, and intermediate network servers (e.g., proxy, SOCKS).

6.1 Web serving with the HTTP Server

The Hypertext Transfer Protocol (HTTP) server allows AS/400 systems attached to a TCP/IP network to provide objects to any Web browser. At a high level, the connection is made, the request is received and processed, the data is sent to the browser, and the connection is ended. The HTTP server jobs and the communications router tasks are the primary job/tasks involved (there is not a separate user job for each attached user).

Workload Description and Data Interpretation: The workload is a program that runs on a client workstation. The program simulates multiple Web browser clients and repetitively issues 'URL requests' to the AS/400 Web server. The number of simulated clients can be adjusted to vary the offered load. Each of the transaction types listed in the tables serve about 1000 bytes:

Static Page: serves a static page via the HTTP server. This information can be accessed from the web server's cache of specified IFS files.
CGI (HTML): invokes a CGI program that accesses data from IFS and serves a simple HTML page via the HTTP server. This runs in a named activation group.
CGI (SQL): invokes a CGI program that performs a simple SQL request and serves the result via HTTP server. This runs in a named activation group.
Persistent CGI: invokes a CGI program that receives a handle supplied by the browser, accesses data from IFS and serves a simple HTML page via the HTTP server.
Net.Data (HTML): invokes the Net.Data program that serves a simple HTML page via the HTTP server.
Net.Data (SQL): invokes the Net.Data program that performs a simple SQL request and serves the result via HTTP server.
Servlet: invokes a Java servlet that accesses data from IFS and serves a simple HTML page via the HTTP server.

Each of the above can be served in a secure or non-secure fashion.
"Relative CPU time" is the average AS/400 CPU time to process the transaction for each specific scenario.
"AS/400 Capacity (hits(sec/CPW)" is the capacity metric used to estimate the capacity of any AS/400 model.
Note that transaction/sec/CPW can be used interchangeably with hits/sec/CPW. An example exists in the conclusions.
"Secure:Nonsecure time ratio" indicates the extra CPU processing required to execute a given transaction in a secure mode.

The CGI programs were compiled using a "named" activation group. For more information on program activation groups, refer to AS/400 ILE concepts, SC41-5606.

Table 6.1 V4R4 AS/400 Web serving Capacity Planning
Transaction type:	Nonsecure		Secure
Transaction type:	Capacity metric: hit/sec/CPW	Relative CPU time	Capacity metric: hit/sec/CPW	Secure:Nonsecure CPU time ratio
Static page (cached)	1.86	0.6 x	0.58	3.2
Static page (not cached)	1.18	1.0 x	0.48	2.5
CGI (HTML)	0.44	2.7 x	0.28	1.6
CGI (SQL)	0.43	2.7 x	0.28	1.5
Persistent CGI	0.44	2.7 x	0.25	1.8
Net.Data (HTML)	0.24	4.9 x	0.19	1.3
Net.Data (SQL)	0.15	7.9 x	0.13	1.2
Servlet	0.40	2.9 x	0.28	1.4
Note: IBM HTTP server for AS/400: V4R4; 100 Mbps Ethernet; with TCPONLY(*YES) Based on measurement from AS/400 Model 720-2062 Static page caching done with IBM HTTP server (WRKHTTPCFG) All request cached for Net.Commerce 1KB data served for each of the transaction types Data assumes no access logging CGI programs compiled with "named" activation group Secure measurements done with Secured Socket Layer (SSL) with 40-bit RC4 encryption transactions/second/CPW can be used interchangeably with hits/sec/CPW CPW is the "Relative System Performance Metric", found in Chapter 2, "AS/400 System Capacities and CPW" Web server capacities may not scale exactly by CPW, therefore, results may differ significantly from those listed here NA = Not available

Figure 6.1 As/400 Web Serving V4R4 Relative Capacities

Web Server Performance Tips and Techniques:

V4R4 provides a performance improvement of up 70% over that of V4R3 (with similar hardware). This is mostly due to improvements in the IBM HTTP Server and TCP/IP performance. For static pages that are not cached, V4R4 provides up to 7% more capacity. For static pages that are cached, V4R4 provides up to 20% more capacity. For CGI and Net.Data transactions, V4R4 provides up to 70% more capacity.

V4R3 provided a performance improvement in capacity of up to 65% over that of V4R2 (with similar hardware). This is mostly due to the improved efficiency of the IBM HTTP Server over that of the ICS/400 of V4R2. For static pages that are not cached, V4R3 provides up to 20% more capacity. For static pages that are cached, V4R4 provides up to 65% more capacity. There were also significant improvements for Net.Data and CGIs with named activations in V4R3.
Web Server Capacity (Example Calculations): throughput for web serving is typically discussed in terms of the number of hits/second or transactions/second. Typically the CPU will be the resource that determines overall system capacity. If the IOPs become the resource that limits system throughput, then the number of IOPs supporting the load could be increased. For system configurations where the CPU is the limiting resources, Table 6.1 above can be used for capacity planning. Use these high-level estimates with caution. They do not take the place of a complete capacity planning session with actual measurements of your particular environment. Remember that these example transactions are fairly trivial. Actual customer transactions may be significantly more complex and therefore consume additional CPU resources. Scaling issues for the server, the application, and the database also may come into consideration when using N-way processors with higher projected capabilities.
Example 1: Estimating the capacity for a given model an transaction type:Estimate the system capacity by multiplying CPW (relative relative system performance metric) for the AS/400 model with the appropriate hits/second/CPW value (the capacity metric provided in the table),
Capacity = CPW * hits/sec/CPW
For example, a 170-2386 rated at 460 CPW doing web serving with CGI programs would have a capacity of 202 trans/sec (460 x 0.44 = 202). This assumes that the entire capacity of the system would be allocated for Web serving. If other work is also on the system, you must pro-rate the CPU allocation. for example, if only 25% of the CPU is allocated for Web serving, then you would have a web serving throughput of 50 trans/sec (460 x 0.25 x 0.44 = 50).
Example 2: Estimating how many CPWs are required for a given web transaction load: Characterize the transaction make-up of the estimated workload and the required transaction rate (in transactions/sec). Estimate the CPWs required to support a given load by dividing the required transaction rate by the appropriate hit/sec/CPW value (the capacity metric provided in the table).
Required CPWs = transaction rate / hits/sec/CPW .
For example, in order to support 175 CGI trans/sec, 398 CPWs would be required (175/0.44 = 398 CPWs). I a mixed load is being assessed, then calculate the required CPWs for each of the components and add them up. Select an AS/400 model that fits and allow enough room for future growth.
Net.Data:
- Net.Data is more disk I/O intensive than typical HTTP trnsactions. Therefore more HTTP server jobs may be needed to provide the optimal level of system throughput.
- A Net.Data SQL macro il slower than an SQL CGI.bin. This is because the Net.Data SQL macro is interpreted while the SQL CGI.bin is compiled code. There are functional advantages in using an SQL macro.
  - direct reuse of existing SQL statements (no programming required)
  - provides the buil-in ability to format SQL results
  - provides the ability to store SQL results in a table and pass the results to a different language environment (e.g., REXX).
CGI and persistent CGI: Significant (perhaps as much as 6x) performance benefits can be realized by compiling into a "named" versus a "new" activation group. It is essential for good performance that CGI-based applications use named activation groups. Refer to the AS/400 ILE concepts for more details on activation groups.
Persistent CGI is specific to applications needing to keep state information across web transactions. Don't confuse persistent CGI with a way to improve the performance of your CGI program. You'll notice in the earlier table that the performance of CGI is nearly identical to that of the persistent CGI due to the advantages gained by runnin in a "named" activation group.
Web Server Cache for IFS files: Serving static pages that are cached can increase Web server capacity by about 50%. Ensure that highly used files are selected to be in the cache (WRKHTTPCFG).
Page size: The data in the table assume about 1K bytes being served. If the pages are larger, more bytes are processed, CPU processing per transaction significantly increases, and therefore the transaction capacity metrics would be reduced.
Response Time (general): User response time is made up Web browser (client workstation) time, network time, and server time. A problem in any one of these areas may cause a significant performance problem for an end-user. To an end-user, it may seem apparent that any performance problem would be attributable to the server, even though the problem may lie elsewhere.

It is common for pages that are being served to have imbedded images (e.g., GIFs). Each of these separate Internet transactions adds to the response time since they are treated as independent HTTP requests and can be retrieved from various servers (some browsers can retrieve multiple URLs concurrently).
HTTP and TCP/IP Configuration Tips:
1. The number of HTTP server jobs: The CHGHTTPA command has parameters that specify the minimum and maximum number of server jobs. This is a system-wide value. The WRKHTTPCFG also can specify similar values (MaxActiveThreads and MinActiveThreads). These values would override the values that are set via CHGHTTPA and would be for a given configuration. The reason for having multiple server jobs is that when one server is waiting for aq disk or communication I/O to complete, a different server job can process another user's request. Also, for N-ways systems, each CPU may simultaneously process server jobs. The system will adjust the number of the servers that are needed automatically (within the bounds of the minimum and the maximum required).
  
  The values specified are the number of "child or worker" threads. Typically, 5 server threads are adequate for smaller systems (100 CPWs or less). For larger systems dedicated to HTTP serving, increasing the number of servers to 10 or more may provide better performance. A starting point for the maximum number of threads can be the CPW value divided by 20. Try not to have more than is needed as this may cause unnecessary system activity.
2. The maximum frame size parameter (MAXFRAME on LIND) can be increased from 1994 bytes for TRLAN (or other values for other protocols) to its maximum of 16393 to allow for larger transmissions. Typically documents are larger than 1994 bytes.
3. The maximum transmission unit (MTU) size parameter (CFGTCP command) for both the route and the interface affect the actual size of the line flows. Increasing these values from 576 bytes to a larger size (up to 16388)will most likely reduce the overall number of transmissions, and therefore, increase the potential capacity of the CPU and the IPO.
  
  Similar parameters also exist on the Web browser. The negotiated value will be the minimum of the server and browser (and perhaps any bridges/routers), so increase them all.
4. Increasing the TCP/IP buffer size (TCPRCVBUF and TCPSNDBUF on the CHGTCPA or CFGTCP command) from 8K bytes to 64K bytes may increase the performance when sending larger amounts of data. If data coming into the server is simply requests, increasing TCPRCVBUF may not provide any benefit.
5. Secure Web serving: Secure Web serving involves additional overhead to the server. Additional line flows occur (fixed overhead) and the data is encrypted (variable overhead proportional to the number of bytes). Note the capacity factors in the tables above comparing non-secure and secure serving. For simple transactions (e.g., static page serving) the impact of secure serving is 2x or more based on the number of bytes served. For complex transactions (e.g., CGI or Net.Data) the overhead is in the range of 15_40%.
6. E-Business applications typically yield a variety of complex transactions. These transactions have sub-transactions made of static pages, CGI, Net.Data, etc.. Capacity planning for these is more complex and warrants a carefull analysis of the make-up of the transactions. The data from the tables can assist with this analysis.
7. Error and Access Logging: Having logging turned on causes a small amount of system overhead (CPUtime, extra I/O). Turn loggin off for best capacity. Use WRKHTTPCFG command to make these changes.
8. Name Server Access: For each Internet transaction, the server accesses the name server for information (IP address and name translations). These accesses cause significant overhead (CPU time, comm I/O) and greatly reduce system capacity. These accesses can be eliminated by using the WRKHTTPCFG command and the adding the line "DNSLookup Off".
HTTP Server Memory Requirements: Follow the faulting threshold guidelines suggested in the work management guide by observing/adjusting the memory in both the machine pool and the pool that the HTTP servers run in (WRKSYSSTS).
AS/400 model selection: Use the information provided in this section along with the characterization of your HTTP load environment in a capacity planning exercise (perhaps with BEST/1) to choose the appropriate AS/400 model. All the tasks, jobs and threads associated with HTTP serving are 'non-interactive', so AS/400e servers or AS/400 Advanced Servers would provide the best price/performance (unless other interactive work is present on the system).
File System Considerations: Web serving performance varies significantly based on which file system is used. Each file system has different overheads and performance characteristics. Note that serving from the ROOT or QOPENSYS directories provides the best system capacity. If Web page development is done from another directory, consider copying the data to a higher-performing file system for production use.

The web serving performance of the non-thread-safe file system is significantly less than the root directory. Using QDLS or QSYS may decrease capacity by 2-5 times. For a more detail discussion of IFS performance, please refer to the V4R2 version of this document.
File Size Considerations: The connect and disconnect costs are similar regardless of size, but cost for the transmission of data withe the TCP/IP and the IFS access vary with size. As file size increases, the IOP is more efficient by being able to a higher aggregate data rate. However, been larger, the files require more data frames, thus causing the hit/second capacity for the IOP to go down accordingly.
Communications/LAN IOPs: Since there are a dozen line flows or more per transaction, the Web serving environment utilizes the IOP more than other communications environment. Use the performance monitor (STRPRFMON) and the component report (PRTCPTRPT) to measure IOP utilization. Attempt to keep the average IOP utilization at 60% or less for best performance.

IOP capacity depends on file size and MTU size (make sure you increase the maximum MTU size parameter). Additional information on communications/LAN IOP performance can be found in section LAN of this manual.

The 2619 or the 2617 LAN IOPs have a capacity of roughly 70 hits/sec when serving small (e.g., 1K byte) nonsecure pages (keep in mind that each hit contains a dozen or so line flows). Using Ethernet or TRLAN IOPs from V4R1 or more recent, have capabilities in the 100-130 hits/sec range. If 100Mb Ethernet is used and the TCPONLY parameter in the LIND has a value od *YES, then capacities up to 250 hits/sec may be seen.

On larger AS/400 models, the comm/LAN IOP may become the bottleneck before the CPU does. If additional capacity is needed, multiple IOPs (with unique IP addresses) could be configured. The overall worload would have to be 'manually' balanced by Web browsers requesting documents from a set of interfaces. The load can also be balanced across multiple IP addresses by using DNS (domain name server).