Effective AS/400 CL Error Handling

by Dan Riehl

Copyright 1995,1996 Powertech Toolworks, Inc.

No portion of this document may be reproduced in any form unless it carries the copyright notice intact. As long as the copyright notice is included, you are free to reproduce the document for your own personal use. Powertech Toolworks, Inc assumes no liability for inconsistencies or errors found, or arising from the use, or misuse of this educational material.

I hadn't even finished my first cup of coffee when the payroll manager called to say he had a message on his screen that read "CPF0001 received by PAY200C at 400. (C D I R)." He said it was the third time that morning he had received this message. Each time, he simply typed I on the line at the bottom of the display, and everything seemed to be back to normal. After all, he knew that I meant "ignore" and that the problem would probably disappear after a while. The payroll had to be done, and there was no time to wait on computer problems.

Is this scenario familiar? Do your users ever see the Display Program Messages display that lets them "C D I R" (Cancel, Dump, Ignore, Retry) your programs? If so, your programs are out of control. You have yielded to your users the responsibility of deciding how your programs proceed when an unexpected error condition occurs. The potential for severe damage to your files cannot be minimized.

When I began writing code for the S/38, users routinely saw the Display Program Messages display, and they would call each time to ask me how to proceed. Since then, I've gotten better at preventing this problem. Now I write programs that expect the unexpected. My main tool in this endeavor is the CL error-handling routine.

In this column, I examine a few standard error-handling routines popular with CL programmers and explain how they work and the strengths and weaknesses of each. As we progress, you'll see there is no one best routine for all situations. But first, a short discussion of error-handling concepts will provide some foundation.

Error-Handling Concepts

When an IBM program detects a severe error, it sends an *ESCAPE message to the calling program's message queue. Before the escape message is sent, one or more *DIAG (diagnostic) messages may also be sent to the previous program's message queue. There is no guarantee that diagnostic messages will be sent; usually, none are. The escape message sometimes very clearly explains the error that occurred (e.g., Object FILE1 in library LIBRARY1 not found); other times, the wording is ambiguous (e.g., Error occurred while processing). Diagnostic messages usually accompany the more ambiguous escape messages to provide more information about the cause of the error.

When an escape message is sent to a program's message queue, that program can trap the error by using the MONMSG (Monitor Message) command. But MONMSG can't intercept the diagnostic messages that sometimes precede the escape message. To have your programs receive both the *DIAG and *ESCAPE messages sent to the program message queue, you can use the RCVMSG (Receive Message) command.

Your CL programs should try to emulate the error handling performed by most IBM-supplied programs. Your programs should send an escape message when they encounter a severe error, and at times they should precede the escape message with diagnostic messages. The result will be a consistent, familiar approach to error handling whether the error occurs in an IBM-supplied program or in one of your own.

The Ostrich Method

One way to deal with errors is to have your programs ignore them. I call this the Ostrich Method -- or the Sergeant Schultz Method after the Hogan's Heros character whose trademark reaction to anything out of the ordinary was "I know nothing . . . I see nothing." The Ostrich Method is very easy to implement in CL. You simply code a program-level message monitor that ignores all errors:


    PGM

    .

    .   DCL statements

    .

    MONMSG CPF0000   /* The Ostrich Method */

    .

    .   Program code

    .

    RETURN

    ENDPGM

Notice that the MONMSG command does not contain the EXEC parameter, thus in effect telling CL just to ignore all errors. "I know nothing." With this method, you'll seldom have to worry about your users seeing the Display Program Messages screen. Neither will you need to worry about impressing your boss with your witty anecdotes. Instead, you'll be able to explore new career paths; presidential politics might be just your calling.

Ignoring all error messages is inherently dangerous, because you're never quite sure what your program is doing. The Ostrich Method is the worst possible method. Keep your head out of the sand; you need to know what your programs are doing!

The Gossip Method

Figure 1 shows a method of error handling that, in my experience, is the most widely used among AS/400 programmers. In this method, the program-level MONMSG command passes control to label ERROR if a CPF escape message is sent to the program's message queue. This example uses many sound error-handling principles.

Figure 1...The Gossip method

--------------------------------------------------------------------


        PGM

        DCL   &msgid     *CHAR 7

        DCL   &msgdta    *CHAR 256

        DCL   &msgf      *CHAR 10

        DCL   &msgflib   *CHAR 10

        DCL   &errorsw   *LGL

        MONMSG     CPF0000 EXEC(GOTO ERROR)

        .

        .    (Include normal processing here)

        .

        RETURN     /* Normal End of Program */



ERROR:  IF   &errorsw                     +

            (SNDPGMMSG MSGID(CPF9999)     +

                       MSGF(QCPFMSG)      +

                       MSGTYPE(*ESCAPE))

        CHGVAR     &errorsw    '1'

ERROR2: RCVMSG     MSGTYPE(*DIAG)         +

                   MSGDTA(&msgdta)        +

                   MSGID(&msgid)          +

                   MSGF(&msgf)            +

                   SNDMSGFLIB(&msgflib)

        IF         (&msgid *EQ ' ')       +

                   GOTO ERROR3

        SNDPGMMSG  MSGID(&msgid)          +

                   MSGF(&msgflib/&msgf)   + 

                   MSGDTA(&msgdta)        +

                   MSGTYPE(*DIAG)

        GOTO       ERROR2



ERROR3: RCVMSG     MSGTYPE(*EXCP)         +

                   MSGDTA(&msgdta)        +

                   MSGID(&msgid)          +

                   MSGF(&msgf)            +

                   SNDMSGFLIB(&msgflib)

        SNDPGMMSG  MSGID(&msgid)          +

                   MSGF(&msgflib/&msgf)   +

                   MSGDTA(&msgdta)        +

                   MSGTYPE(*ESCAPE)

        ENDPGM 

------------------------------------------------------------------------

The commands at label ERROR ensure that the program is not looping within the error routine. The code at ERROR2 receives all diagnostic messages from the message queue and forwards them to the calling program. At ERROR3, the *ESCAPE message that caused the error routine to be executed is received and re-sent to the previous program's message queue, immediately ending the program.

But this technique also has its problems. Like a compulsive gossip passing along every minor and irrelevant piece of information, the program resends all diagnostic messages to the previous program's message queue. The routine treats all diagnostic messages as if they were associated with the escape message that caused the program to fail, but it actually has no way of knowing whether the messages are relevant. Usually, they aren't; many diagnostic messages have nothing to do with the reason the program failed. For example, they might be in the queue as the result of a problem with a previous command under the control of a command-level MONMSG. (Although it is good practice to remove irrelevant messages from the program's message queue when they are generated, most programs don't. A future column will cover the topic of keeping your program message queues cleaned up by removing these gossipy or "noise" messages.)

Another problem with this method is that it cannot reliably determine whether more diagnostic messages remain in the queue. Not all diagnostic messages are predefined in a message file, and the message ID of those that are not contains blanks. The command IF (&msgid = ' ') in label ERROR2 is supposed to determine whether a diagnostic message was received. But when a diagnostic message that has no message ID is sent, the routine may erroneously conclude there are no more diagnostic messages.

For this routine to correctly identify all diagnostic messages, it must check the message return type (RTNTYPE) rather than the message ID (diagnostic messages have the return type 02). Also remember that the message may or may not have a message ID, so you must handle both possibilities when you resend the message. To eliminate the problem, you can replace routine ERROR2 with the following code:


ERROR2: RCVMSG MSGTYPE(*DIAG) MSG(&msg) +

               MSGDTA(&msgdta)          +

               MSGID(&msgid)            +

               RTNTYPE(&rtntype)        +

               MSGF(&msgf)              +

               SNDMSGFLIB(&msgflib)



        IF     (&rtntype *NE '02')      +

               GOTO ERROR3



        IF     (&msgid = ' ')   DO

               SNDPGMMSG MSG(&msg)      +

                         MSGTYPE(*DIAG)

        ENDDO

        ELSE   DO

               SNDPGMMSG MSGID(&msgid)        +

                         MSGF(&msgflib/&msgf) +

                         MSGDTA(&msgdta)      +

                         MSGTYPE(*DIAG)

        ENDDO           



        GOTO     ERROR2

You'll also need to declare variables &msg (*CHAR 256) and &rtntype (*CHAR 2).

One problem still remains: how to resend only those diagnostic messages related to the error that caused the program to fail. There is no bulletproof way to determine which diagnostic messages are associated with an escape message, but two methods can help. The first is QUSRTOOL error handler CLPSTDERR. This routine grabs the escape message that caused the program to fail and then receives the previous message in the program's queue. If the previous message is a diagnostic message, the routine assumes it is associated with the escape message and resends it, followed by the escape message. If you want to resend diagnostic messages, this is a good way to do it. For more information about the QUSRTOOL error handling routine, see source member CLPSTDERR in QUSRTOOL/QATTINFO.

A Simple Solution

Another way to handle the diagnostic message association problem is by not forwarding diagnostic messages. Although this approach might seem simplistic, in most situations it can be the best solution: Because you can't reliably find out which diagnostic messages are associated with the program's failure, why take a chance on misleading the user with "noise" messages? If the escape message is too ambiguous to be useful, you can still view the diagnostic messages from the job log.

Figure 2 shows sample code that takes this approach. When control is passed to label ERROR, the escape message that caused the problem is received and re-sent to the previous program's message queue. This is the standard routine I use. It's simple and effective.

Figure 2...A Simple Solution

---------------------------------------------------------------------------------------


        PGM

        DCL     &msgid   *CHAR 7

        DCL     &msgf    *CHAR 10

        DCL     &msgflib *CHAR 10

        DCL     &msgdta  *CHAR 100

        MONMSG  CPF0000 EXEC(GOTO ERROR)

       .

       .    (Include normal processing here)

       .

        RETURN     /* Normal end of program */

ERROR:  RCVMSG     MSGTYPE(*LAST)         +

                   MSGDTA(&msgdta)        +

                   MSGID(&msgid)          +

                   MSGF(&msgf)            +

                   SNDMSGFLIB(&msgflib)  

        MONMSG     CPF0000  /* Just in case */

        SNDPGMMSG  MSGID(&msgid)          +

                   MSGF(&msgflib/&msgf)   +

                   MSGDTA(&msgdta)        +

                   MSGTYPE(*ESCAPE)

        MONMSG     CPF0000  /* Just in case */

        ENDPGM

---------------------------------------------------------------------------------

Standard, Yes; Routine, No!

Although it's important to have a standard error handler you can include in your programs, the act of including it should not become a routine to be performed without question. Many of your programs will have special error-handling requirements, and you should accommodate those whenever you can. For instance, when you're performing arithmetic operations in CL programs or employing user-written commands that contain return variables, your program-level MONMSG should include message ID MCH0000:

MONMSG (CPF0000 MCH0000) EXEC(GOTO ERROR)

Likewise, some licensed program products send escape messages that don't have the CPF prefix (e.g., some OfficeVision/400 routines send OFCxxxx escape messages). In such cases, your program-level MONMSG should include those escape messages.

If your program executes commands that you know send a diagnostic message before the escape message, you should always resend the diagnostic message. For example, when the CPYF (Copy File) command generates an escape message, the real cause of the error appears in the diagnostic message rather than in the escape message.

No single error-handling routine is the best for all situations. However, you should definitely stay away from the Ostrich and Gossip Methods. Let the operations your program performs dictate the error handler you use: Sometimes you'll be able to use a standard error handler; other times you'll want to provide error-handling code specific to the application.

This article first appeared in the July 1995 issue of News/400 Magazine.