|
by Dan Riehl
Copyright 1995,1996 Powertech Toolworks, Inc.
No portion of this document may be reproduced in any form unless it carries the copyright notice intact. As long as
the copyright notice is included, you are free to reproduce the
document for your own personal use. Powertech Toolworks, Inc assumes no
liability for inconsistencies or errors found, or arising from
the use, or misuse of this educational material.
I hadn't even finished my first cup of coffee when the payroll
manager called to say he had a message on his screen that read
"CPF0001 received by PAY200C at 400. (C D I R)." He
said it was the third time that morning he had received this message.
Each time, he simply typed I on the line at the bottom of the
display, and everything seemed to be back to normal. After all,
he knew that I meant "ignore" and that the problem would
probably disappear after a while. The payroll had to be done,
and there was no time to wait on computer problems.
Is this scenario familiar? Do your users ever see the Display
Program Messages display that lets them "C D I R" (Cancel,
Dump, Ignore, Retry) your programs? If so, your programs are out
of control. You have yielded to your users the responsibility
of deciding how your programs proceed when an unexpected error
condition occurs. The potential for severe damage to your files
cannot be minimized.
When I began writing code for the S/38, users routinely saw the
Display Program Messages display, and they would call each time
to ask me how to proceed. Since then, I've gotten better at preventing
this problem. Now I write programs that expect the unexpected.
My main tool in this endeavor is the CL error-handling routine.
In this column, I examine a few standard error-handling routines
popular with CL programmers and explain how they work and the
strengths and weaknesses of each. As we progress, you'll see there
is no one best routine for all situations. But first, a short
discussion of error-handling concepts will provide some foundation.
When an IBM program detects a severe error, it sends an *ESCAPE
message to the calling program's message queue. Before the escape
message is sent, one or more *DIAG (diagnostic) messages may also
be sent to the previous program's message queue. There is no guarantee
that diagnostic messages will be sent; usually, none are. The
escape message sometimes very clearly explains the error that
occurred (e.g., Object FILE1 in library LIBRARY1 not found); other
times, the wording is ambiguous (e.g., Error occurred while processing).
Diagnostic messages usually accompany the more ambiguous escape
messages to provide more information about the cause of the error.
When an escape message is sent to a program's message queue,
that program can trap the error by using the MONMSG (Monitor Message)
command. But MONMSG can't intercept the diagnostic messages that
sometimes precede the escape message. To have your programs receive
both the *DIAG and *ESCAPE messages sent to the program message
queue, you can use the RCVMSG (Receive Message) command.
Your CL programs should try to emulate the error handling performed
by most IBM-supplied programs. Your programs should send an escape
message when they encounter a severe error, and at times they
should precede the escape message with diagnostic messages. The
result will be a consistent, familiar approach to error handling
whether the error occurs in an IBM-supplied program or in one
of your own.
One way to deal with errors is to have your programs ignore them.
I call this the Ostrich Method -- or the Sergeant Schultz Method
after the Hogan's Heros character whose trademark reaction to
anything out of the ordinary was "I know nothing . . . I
see nothing." The Ostrich Method is very easy to implement
in CL. You simply code a program-level message monitor that ignores
all errors:
Ignoring all error messages is inherently dangerous, because
you're never quite sure what your program is doing. The Ostrich
Method is the worst possible method. Keep your head out of the
sand; you need to know what your programs are doing!
Figure 1 shows a method of error handling that, in my experience,
is the most widely used among AS/400 programmers. In this method,
the program-level MONMSG command passes control to label ERROR
if a CPF escape message is sent to the program's message queue.
This example uses many sound error-handling principles.
Figure 1...The Gossip method
--------------------------------------------------------------------
The commands at label ERROR ensure that the program is not looping
within the error routine. The code at ERROR2 receives all diagnostic
messages from the message queue and forwards them to the calling
program. At ERROR3, the *ESCAPE message that caused the error
routine to be executed is received and re-sent to the previous
program's message queue, immediately ending the program.
But this technique also has its problems. Like a compulsive gossip
passing along every minor and irrelevant piece of information,
the program resends all diagnostic messages to the previous program's
message queue. The routine treats all diagnostic messages as if
they were associated with the escape message that caused the program
to fail, but it actually has no way of knowing whether the messages
are relevant. Usually, they aren't; many diagnostic messages have
nothing to do with the reason the program failed. For example,
they might be in the queue as the result of a problem with a previous
command under the control of a command-level MONMSG. (Although
it is good practice to remove irrelevant messages from the program's
message queue when they are generated, most programs don't. A
future column will cover the topic of keeping your program message
queues cleaned up by removing these gossipy or "noise"
messages.)
Another problem with this method is that it cannot reliably determine
whether more diagnostic messages remain in the queue. Not all
diagnostic messages are predefined in a message file, and the
message ID of those that are not contains blanks. The command
IF (&msgid = ' ') in label ERROR2 is supposed to determine
whether a diagnostic message was received. But when a diagnostic
message that has no message ID is sent, the routine may erroneously
conclude there are no more diagnostic messages.
For this routine to correctly identify all diagnostic messages,
it must check the message return type (RTNTYPE) rather than the
message ID (diagnostic messages have the return type 02). Also
remember that the message may or may not have a message ID, so
you must handle both possibilities when you resend the message.
To eliminate the problem, you can replace routine ERROR2 with
the following code:
One problem still remains: how to resend only those diagnostic
messages related to the error that caused the program to fail.
There is no bulletproof way to determine which diagnostic messages
are associated with an escape message, but two methods can help.
The first is QUSRTOOL error handler CLPSTDERR. This routine grabs
the escape message that caused the program to fail and then receives
the previous message in the program's queue. If the previous message
is a diagnostic message, the routine assumes it is associated
with the escape message and resends it, followed by the escape
message. If you want to resend diagnostic messages, this is a
good way to do it. For more information about the QUSRTOOL error
handling routine, see source member CLPSTDERR in QUSRTOOL/QATTINFO.
Figure 2 shows sample code that takes this approach. When control
is passed to label ERROR, the escape message that caused the problem
is received and re-sent to the previous program's message queue.
This is the standard routine I use. It's simple and effective.
Figure 2...A Simple Solution
---------------------------------------------------------------------------------------
Although it's important to have a standard error handler you can
include in your programs, the act of including it should not become
a routine to be performed without question. Many of your programs
will have special error-handling requirements, and you should
accommodate those whenever you can. For instance, when you're
performing arithmetic operations in CL programs or employing user-written
commands that contain return variables, your program-level MONMSG
should include message ID MCH0000:
Likewise, some licensed program products send escape messages
that don't have the CPF prefix (e.g., some OfficeVision/400 routines
send OFCxxxx escape messages). In such cases, your program-level
MONMSG should include those escape messages.
If your program executes commands that you know send a diagnostic
message before the escape message, you should always resend the
diagnostic message. For example, when the CPYF (Copy File) command
generates an escape message, the real cause of the error appears
in the diagnostic message rather than in the escape message.
No single error-handling routine is the best for all situations.
However, you should definitely stay away from the Ostrich and
Gossip Methods. Let the operations your program performs dictate
the error handler you use: Sometimes you'll be able to use a standard
error handler; other times you'll want to provide error-handling
code specific to the application.
This article first appeared in the July 1995 issue of News/400
Magazine.
|