Mercurial > ~dholland > hg > ag > index.cgi

\chapter{Exploring Your Grammar III: Diagnostic Windows}

If AnaGram finds problems with your grammar, it creates a number of
windows to help you identify and correct the problems.  These windows
are the \agwindow{Warnings} window, the \agwindow{Conflicts} window,
the \agwindow{Resolved Conflicts} window, and the \agwindow{Keyword
Anomalies} window.  To expand upon these windows, AnaGram also
provides, using the \agmenu{Auxiliary Windows} menu, the following
supporting windows: \agwindow{Conflict Trace}, \agwindow{Keyword
Anomaly Trace}, \agwindow{Reduction Trace}, \agwindow{Rule
Derivation}, \agwindow{Token Derivation} and \agwindow{Problem
States}.  This chapter describes these windows and their use.


\section{The Warnings Window}
\index{Warnings}\index{Window}

If while analyzing your syntax file, AnaGram finds something
suspicious, it issues a warning, regardless of the severity of the
problem.  The \agwindow{Warnings} window will pop up automatically
when the analysis has been completed.  If the warning is for a syntax
error in your input file, you will have to fix it, because AnaGram
cannot successfully interpret your syntax file.  Otherwise, no matter
how serious the warnings may be, AnaGram will create a parser if it is
instructed to do so.  Thus, if you decide the warnings are of no
consequence, you may choose to ignore them.  Note that AnaGram will
resolve shift-reduce conflicts by choosing the shift action.  It will
resolve reduce-reduce conflicts by choosing the rule that occurred
first in the grammar.

If you have \index{Syntax error}\index{Errors}syntax errors, AnaGram
will synchronize the cursor in the syntax file window with the cursor
in the \agwindow{Warnings} window so that as you move through the
warning messages, the cursor in the syntax file window will move to
the point of the error.

AnaGram's warning messages are listed, in alphabetical order, in
Appendix B, together with explanations of the messages and suggestions
as to what to do about them.


\section{The Conflicts Window}
\index{Conflicts}\index{Window}

The \agwindow{Conflicts} window lists the conflicts which AnaGram
found in your grammar.  The table identifies the parser states in
which it found conflicts.  For each conflict state it identifies the
reducing token for which it had more than one option, and the marked
rules which characterize each such option.  If AnaGram was able to
resolve a conflict by using \agparam{precedence} or \agparam{sticky}
declarations, the conflict will be listed in the \index{Resolved
Conflicts}\index{Window}\agwindow{Resolved Conflicts} window instead.

The \agwindow{Conflicts} window and the \agwindow{Resolved Conflicts}
window each require several lines to display a conflict.  For each
conflict, the first, or header, line identifies the conflict state and
the conflict token.  The conflict token is a terminal token for which
there is more than one parser action consistent with the rules of the
grammar.  If both shift and reduce actions are consistent with the
grammar, the conflict is a \agterm{shift-reduce} conflict.  If more
than one reduce action is consistent with the grammar, the conflict is
said to be a \agterm{reduce-reduce} conflict.

Following the header line, AnaGram lists the grammar rules which the
conflict token could satisfy.  The rules which correspond to shift
actions are listed first, followed by those which correspond to reduce
actions.  Rules which correspond to shift actions have not yet been
completed and have a marked token, shown in a distinctive type face,
which represents the next token expected.  For such rules, the
conflict token is an instance of the marked token and can be shifted
into the input buffer.  If the rule has been completely matched there
is no marked token and the conflict token can reduce the rule.  In the
following discussion, such a rule is referred to as a \agterm{conflict
rule}.  If there is a more than one conflict rule, the conflict is
called a \agterm{reduce-reduce} conflict; otherwise it is a
\agterm{shift-reduce} conflict.  Although it is theoretically possible
for a conflict to be both a shift-reduce conflict and a reduce-reduce
conflict, it is not a common event.

The \agmenu{Auxiliary Windows} menu for the \agwindow{Conflicts} window
provides eleven auxiliary windows to help you understand the
circumstances of a conflict.  Five of these windows address the
conflict directly.  These windows are the \agwindow{Conflict Trace}, the
\agwindow{Reduction Trace}, the \agwindow{Rule Derivation}, the
\agwindow{Token Derivation}, and the \agwindow{Problem States}
windows.  These windows are keyed to the conflict state, the conflict
token, and the conflict rule.  If the cursor bar is not highlighting a
conflict rule, AnaGram scans down the \agwindow{Conflicts} window for
the first conflict rule for the selected state and token.  The use of
these windows in analyzing conflicts is described below.

The other windows in the \agmenu{Auxiliary Windows} menu have been
described in Chapter 6.  These are the \agwindow{Expansion Chain}
window, the \agwindow{Rule Context} window, the \agwindow{Reduction
States} window, the \agwindow{State Definition} window, the
\agwindow{State Expansion} window and the \agwindow{Token Usage}
window.  The \agwindow{Rule Context} window is keyed to the particular
rule highlighted by the cursor.  The
\index{Expansion Chain}\index{Window}\agwindow{Expansion Chain} window
is keyed to the highlighted rule.  If the highlighted rule is a
characteristic rule no expansion chain is possible, so the option will
be greyed out.  The \agwindow{Reduction States} window is keyed to the
highlighted rule if it is a conflict rule, otherwise to the first
conflict rule following the highlighted line.  The \agwindow{State
Definition} and \agwindow{State Expansion} windows are keyed to the
conflict state.  The \agwindow{Token Usage} window is keyed to the
conflict token.


\section{The Resolved Conflicts Window}
\index{Resolved Conflicts}\index{Window}

AnaGram creates the \agwindow{Resolved Conflicts} window only when the
grammar it is analyzing has conflicts and when some of those conflicts
have been resolved by precedence rules (see Chapter 9) or by use of
the \agparam{sticky} directive.  In this case, it will list, in the
\agwindow{Resolved Conflicts} window, the conflicts that were resolved
and the way in which they were resolved.  The layout of the
\agwindow{Resolved Conflicts} window is identical to that of the
\agwindow{Conflicts} window with one exception.  The rules which your
parser will observe, selected in accordance with your \agparam{sticky}
declarations or precedence rules, are marked with asterisks.  Those
which your parser will ignore will not be so marked.  The
\agwindow{Auxiliary Windows} available from the \agwindow{Resolved
Conflicts} window are the same as those available from the
\agwindow{Conflicts} window.


\section{Conflicts}
\index{Conflicts}

Chapter 4 presented a brief introduction to the problem of conflicts
in a grammar.  There, conflicts were simply classified as shift-reduce
conflicts or as reduce-reduce conflicts.  This section examines
conflicts in somewhat greater detail.  It looks at the causes of
conflicts, how to identify the cause of a conflict and how to deal
with it.

Conflicts occur when, in a particular state, a given token, the
\index{Conflict token}\index{Token}\agterm{conflict token}, reduces
more than one rule, or reduces a rule when the token could also be
shifted into the input buffer in accordance with another rule.

Note that there are no shift-shift conflicts, that is, a particular
state could have several
\index{Characteristic rules}\index{Rules}characteristic rules
accepting the same token as a shift token.  A conflict is always
associated with reduction.  This is because LALR parsers keep track of
multiple possibilities in the input stream as long as the input tokens
only cause shifts.  At the point where a reduction occurs the parser
must be able to identify exactly one of the possibilities as
consistent with the input.

\subsection{Shift-Reduce Conflicts}
\index{Shift-reduce conflict}\index{Conflicts}

Unless a conflict is deliberate, the fact that the conflict token
reduces a particular rule is usually a surprise.  Although some
conflicts arise because an LALR-1 parser generator cannot look far
enough ahead, most shift-reduce conflicts result from true
\index{Ambiguities}ambiguities in your grammar.  For example, an
erroneous Pascal grammar contained the following productions, which
illustrate one of the more common classes of shift-reduce conflicts:

\begin{indentingcode}{0.4in}
parameter list
  -> expression
  -> parameter list, expression
\end{indentingcode}

Pascal actually requires that the expressions in a parameter list be
separated by commas.  In other words, the second rule should read:

\begin{indentingcode}{0.4in}
  -> parameter list, comma, expression
\end{indentingcode}

Lacking a requirement for a comma, the erroneous rules specify that a
parameter list is simply a sequence of expressions with nothing
separating them.  For example, using this definition,

\begin{indentingcode}{0.4in}
(a + b) - (c * d)
\end{indentingcode}

can be interpreted either as one expression, if the minus sign is
interpreted as a subtraction operator, or as a sequence of two
expressions if it is interpreted as a unary minus sign.

When AnaGram analyzed the grammar with the erroneous rule it found a
conflict in the state where the minus sign was expected.  If the minus
sign is the subtraction operator, it should be shifted onto the input
stack.  On the other hand, if the minus sign is a unary minus sign,
the sum \agcode{(a+b)} should be reduced to \agcode{expression}, and
then to \agcode{parameter list}.  Thus, this ambiguity gives rise to a
shift-reduce conflict, where it is impossible to determine a unique
parser action.

\subsection{Identifying Ambiguities}
\index{Ambiguities}

Unfortunately, shift-reduce conflicts usually show up in parser states
which are far removed from the true source of the problem.  When your
grammar has a shift-reduce conflict of the sort illustrated above,
there may be no readily apparent link to the erroneous
production.  AnaGram provides several options to help you establish the
link, all available using the \agmenu{Auxiliary Windows} menu from the
\agwindow{Conflicts} window.  These options comprise two pre-built
grammar traces, the \agwindow{Conflict Trace} and the
\agwindow{Reduction Trace}; two special windows called \agwindow{Rule
Derivation} and \agwindow{Token Derivation}; the \agwindow{Expansion
Chain} window and a special version of the \agwindow{Reduction States}
window called the \agwindow{Problem States} window.  The four trace
and derivation windows are coordinated, so that AnaGram actually
computes a mutually consistent set of four whenever you ask for one of
them.

The \agwindow{Conflict} and \agwindow{Reduction Traces} are simply
pre-initialized \agwindow{Grammar Traces} which you may use to become
more familiar with the circumstances of the conflict.  Bear in mind,
however, that unless you have set the
\index{Traditional engine}\index{Configuration switches}
\agparam{traditional engine} configuration switch,
the traces will use compound actions, which in some cases will use the
default resolution of conflicts.  Therefore, unless you set
\agparam{traditional engine} it is not always possible to work through
an alternate resolution of a conflict.

The
\index{Conflict Trace}\index{Trace}\index{Window}\agwindow{Conflict Trace}
shows you one possible sequence of input which will lead to the
conflict state.  Unless a conflict results from a typo or other
clerical error, it is usually a surprise and not something you
anticipated.  The \agwindow{Conflict Trace} helps you to understand
the context of the problem.  It is often helpful to look at the
\agwindow{Rule Stack} for the \agwindow{Conflict Trace} to see what
productions are involved in the conflict.  You can inspect the
\agwindow{State Definition} window for the conflict state to see why
the conflict token can be shifted.

\index{Reduction Trace}\index{Trace}\index{Window}
The \agwindow{Reduction Trace}, for a particular conflict rule,
carries forward a \agwindow{Conflict Trace} by selecting the reduce
action.  In other words, for a parser in the conflict state, the
\agwindow{Reduction Trace} shows the result of using the conflict
token as a reducing token.  The trace shows the result of taking all
possible reductions until a state is reached where the lookahead token
will shift into the input buffer.  If you compare the
\agwindow{Reduction Trace} to the \agwindow{Conflict Trace} you will
notice that they start off with identical tokens.  At some point they
diverge.  All of the tokens in the \agwindow{Conflict Trace} following
this point have reduced to the token in the \agwindow{Reduction
Trace}.  Usually there are no further tokens in the
\agwindow{Reduction Trace}, but if there are, you will notice that
they are all zero-length tokens.  That is, they all have a null
production, so that they come for free, as it were.  If you press
Enter, to display the \agwindow{Select Input} window for the
\agwindow{Reduction Trace}, you will see that the conflict token is
acceptable input.

The \agwindow{Reduction Trace} illustrates the ambiguity in your
grammar that caused the conflict: One of the acceptable input tokens
in the final state of this trace cannot be distinguished from its
predecessor.

The \index{Expansion Chain}\index{Window}\agwindow{Expansion Chain}
window is extremely useful for indicating why a particular grammar
rule is an expansion rule in a particular state.  Sometimes the chain
of substitutions to produce a given rule is quite long, and changes to
a grammar can have unexpected effects.  The \agwindow{Expansion Chain}
window enables you to see how a rule in a conflict state got to be
there.

\index{Problem States}\index{Window}The \agwindow{Problem States}
window for a conflict rule is essentially a trimmed
\agwindow{Reduction States} window which shows only those reduction
states in which the conflict token is acceptable input.  Often it is
sufficient to look at the \agwindow{Problem States} window to see the
source of the problem.  The last state in the \agwindow{Reduction
Trace} window, will be, of course, one of the states in the
\agwindow{Problem States} window.

\subsection{Rule and Token Derivation Windows}

If after inspecting the \agwindow{Problem States}, \agwindow{Conflict
Trace} and the \agwindow{Reduction Trace}, the conflict is still a
mystery, you may pinpoint the problem by looking at the
\agwindow{Rule} and \agwindow{Token Derivation} windows for the
conflict.  Both windows contain a sequence of rules, and both begin
with the same rule, the rule which is the root cause of the conflict.
This rule is one of the characteristic rules of the last state in the
\agwindow{Reduction Trace} and is the rule which produces the ambiguity.

\index{Rule Derivation}\index{Window}In both windows, each rule,
except the last line of the \agwindow{Rule Derivation}, contains a
marked token in a distinctive typeface.  The essence of the ambiguity
is that it is impossible to tell where in the input stream the marked
token in the top line of the \agwindow{Rule Derivation} ends and the
marked token in the top line of the \agwindow{Token Derivation}
begins.

In both derivation windows, each line after the first consists of a
rule produced by the marked token on the previous line.  In the
\agwindow{Rule Derivation}, except on the first line, the marked token
is either the last token in the rule, or all subsequent tokens have
null productions.  The last rule in the derivation window is the
conflict rule you selected in the \agwindow{Conflicts} window.  Thus
the \agwindow{Rule Derivation} window shows you how the rule involved
in the conflict derives from the root.

\index{Window}\index{Token Derivation}In the \agwindow{Token
Derivation} window, except on the first line, the marked token is
either the first token in the rule, or all preceding tokens have null
productions.  The marked token in the last line of the list is the
conflict token.

The \agwindow{Rule Derivation} and \agwindow{Token Derivation} windows
each have five auxiliary windows.  The \agwindow{Rule Context} window
is keyed simply to the highlighted rule.  The other four windows, the
\agwindow{Expansion Rules} window, the \agwindow{Productions} window,
the \agwindow{Set Elements} window, and the \agwindow{Token Usage}
window are keyed to the marked token. Remember that there is no marked
token on the last line of the \agwindow{Rule Derivation} window.

\subsection{Resolving Ambiguities}
\index{Ambiguities}

The shift-reduce conflict discussed earlier stemmed from an oversight
in writing a grammar: the grammar specified a list of items, in this
case expressions, which could not be distinguished without a separator
of some sort.  This conflict can be resolved simply by rewriting the
grammar to require a comma separating the expressions.

A similar problem comes from specifying a list of lists without any
separator.  Sometimes such a specification is made inadvertently.  The
erroneous Pascal grammar cited above had such a conflict.  Here are
the productions that caused trouble:

\begin{indentingcode}{0.4in}
declaration part
  -> declaration section
  -> declaration part, declaration section

declaration section
  -> proc and func declaration part

proc and func declaration part
  -> proc or func declaration
  -> proc and func declaration part, proc or func declaration

proc or func declaration
  -> proc declaration
  -> func declaration
\end{indentingcode}

Of course, there are other productions for \agcode{declaration
section} so that this grammar looks perfectly reasonable on the
surface.  A \agcode{declaration part} is a sequence of
\agcode{declaration sections}.  One type of \agcode{declaration
section}, the \agcode{proc and func declaration part}, is a sequence
of procedure or function definitions.  Unfortunately, if you have two
procedure definitions in a row, it could be interpreted as one
\agcode{proc and func declaration part} consisting of two declarations
or two \agcode{proc and func declaration parts} each consisting of one
declaration.  The problem here is that there is a little too much
recursion.  You can fix this problem by replacing

\begin{indentingcode}{0.4in}
declaration section
  -> proc and func declaration part

proc and func declaration part
  -> proc or func declaration
  -> proc and func declaration part, proc or func declaration
\end{indentingcode}

with

\begin{indentingcode}{0.4in}
declaration section
  -> proc or func declaration
\end{indentingcode}

After a little reflection you will see that this simplification, which
eliminates one source of recursion, entails no loss of generality.

In the above examples, the ambiguities were the results of errors in
the specification of the grammar, and could be easily fixed by
correcting the grammar.  Sometimes, however, ambiguities arise which
are not so easily corrected.  Consider, for instance, a simple token
scanner:

\begin{indentingcode}{0.4in}
tokens
  -> [token, white space?...]...

token
  -> name
  -> number
  -> string
  -> character constant
  -> punctuation character
\end{indentingcode}

where \agcode{name}, \agcode{number}, etc., are defined in the
conventional manner.

Such token scanners are normally written using conventional
programming techniques, or using programs such as \agfile{lex} which
are based on the analysis of regular expressions.  Such programs
simply work on a character by character basis and do not see that
``maryjane'' could be interpreted as one name or as two, or as even
more, since the separating white space is declared to be optional.
AnaGram, however, since it takes a broader view, sees that the
specifications are basically ambiguous and complains.  Note, however,
that the default resolution of conflicts, in this case choosing the
shift action, will provide a parser that does precisely what you would
normally intend.  Thus, the major concern is to make sure that the
conflicts caused by this situation do not mask other more serious
conditions.

While it is possible to rewrite this grammar so that it is not
ambiguous, such a grammar would lack clarity and conciseness.  AnaGram
provides two alternatives for dealing with this problem: the
\index{Declaration}\index{Subgrammar declaration}\agparam{subgrammar}
declaration and the
\index{Sticky declaration}\index{Declaration}\agparam{sticky}
declaration, which accomplish the desired end using very different
techniques.

In the definition above, a grammar for token, taken all by itself,
would not normally show any ambiguities.  We could write a
\index{Lexical scanner}lexical scanner that would identify a token,
pass it on to the parser, and then go on to the next.  We would never
worry about any ambiguities.  In this situation, it is clear that what
we mean by \agcode{token} is defined entirely by the grammar for
\agcode{token}, not by its usage within the rest of the grammar.  When
we say that we have a sequence of tokens, \agcode{token...}, we
usually mean that, of course, two tokens, such as ``mary'' and
``jane'' have to be separated by punctuation or space to be
distinguished as separate tokens.  The
\index{Declaration}\index{Subgrammar declaration}\agparam{subgrammar}
declaration allows you to specify that a particular \index{Nonterminal
token}\index{Token}nonterminal token in your grammar is to be treated
as though it were defined by its own complete grammar, without regard
to its usage in the grander scheme of things.  In the example above,
add the following:

\begin{indentingcode}{0.4in}
[ subgrammar \bra token \ket ]
\end{indentingcode}

Now, when AnaGram conducts its search for reducing tokens, it will
limit its search to those it would find if your grammar consisted only
of the grammar rules necessary to define token.  When it does this, it
never even finds the reducing tokens that appear to cause conflicts.
When you fix a conflict with the \agparam{subgrammar} declaration, the
conflict vanishes entirely.  It is not recorded in the
\agwindow{Resolved Conflicts} window.  For this reason, the
\agparam{subgrammar} declaration should be used only when you are
absolutely sure of the nature of the problem you are trying to solve.

A somewhat more precise mechanism for dealing with the conflicts
described above is to declare name and number as \index{Sticky
declaration}\index{Declaration}\agparam{sticky}:

\begin{indentingcode}{0.4in}
[ sticky \bra name, number \ket ]
\end{indentingcode}

This means that whenever the last token in your parser's input buffer
is \agcode{name} or \agcode{number}, it will shift in any acceptable
input character, and will disregard conflicting reduce options.
AnaGram will continue to identify conflicts in your grammar due to
this cause, but will record them in the \agwindow{Resolved Conflicts}
window.  Any other conflicts you may have will still show up in the
\agwindow{Conflicts} window.

Sometimes it may be desirable to introduce ambiguities deliberately
into a grammar.  For example, although it is easy enough to write
normal grammar rules for the conventional expression logic used in
programming languages, it is possible to produce a somewhat more
compact, faster running parser by writing an ambiguous grammar and
using AnaGram's precedence declarations to resolve the ambiguities.
An example of this technique is given in Chapter 9.

\subsection{Null Productions}

\index{Null productions}\index{Production}Null productions can be a
fertile source of conflicts.  Fortunately, many of them can be easily
resolved by simply rewriting a few rules.  Consider, for example, the
following:

\begin{indentingcode}{0.4in}
name
  -> capital letter, letter..., white space

initial
  -> capital letter, period, white space

full name
  -> name, initial?, name
\end{indentingcode}

In this example, \agcode{initial?} is a virtual production which
AnaGram replaces with
% not ``with''...

\begin{indentingcode}{0.4in}
initial?
  ->
  -> initial
\end{indentingcode}

so \agcode{initial?} involves a null production.

Recall that LALR-1 parsers cannot look ahead more than one character.
The problem with this grammar, then, is that given the input ``John
Q'', there is no way to tell whether the ``Q'' is an initial or the
first letter of the last name.  Is this ``John Q. Public'' or ``John
Quincy''? Adding another production for ``full name'' so that
``initial'' need not be marked as optional allows the parser to defer
its decision:

\begin{indentingcode}{0.4in}
full name
  -> name, name
  -> name, initial, name
\end{indentingcode}

Notice that the technique here is to \emph{avoid a reduction} until
the parser has enough information to proceed.

\subsection{Reduce-Reduce Conflicts}

\index{Reduce-reduce conflicts}\index{Conflicts}Reduce-reduce
conflicts are usually caused by the limited lookahead of LR-1 parsing.
Very occasionally, you might find a grammar with a reduce-reduce
conflict that is caused by the LALR-1 approximation to LR-1 parsing
that AnaGram supports.  Such conflicts are quite easy to resolve,
however, by almost trivial rewriting.

Reduce-reduce conflicts commonly involve productions that are
intrinsically indistinguishable, for instance:

\begin{indentingcode}{0.4in}
tweedledee
  -> tiddly, wink

tweedledum
  -> tiddly, wink
\end{indentingcode}

However, it may be possible to distinguish \agcode{tweedledee} from
\agcode{tweedledum} by context.  Suppose

\begin{indentingcode}{0.4in}
gibber
  -> tweedledee, apples
  -> tweedledum, oranges
\end{indentingcode}

Now if you can distinguish \agcode{apples} from \agcode{oranges} on
the basis of the very first character, your parser will be able to
tell \agcode{tweedledee} from \agcode{tweedledum}.  If, on the other
hand, it takes more than one input character to tell whether you have
\agcode{apples} or \agcode{oranges}, you have a reduce-reduce conflict.

This example, of course, is rather far-fetched, since the potential
confusion between \agcode{tweedledee} and \agcode{tweedledum} is
obvious.  More commonly, the definitions and usage of
\agcode{tweedledee} and \agcode{tweedledum} are physically far apart
in the grammar.  At one place in a grammar:

\begin{indentingcode}{0.4in}
balderdash
  -> tweedledee, apples
\end{indentingcode}

Elsewhere, seemingly unrelated:

\begin{indentingcode}{0.4in}
buncombe
  -> tweedledum, oranges
\end{indentingcode}

In a dark corner:

\begin{indentingcode}{0.4in}
gossip
  -> buncombe, slander
\end{indentingcode}

And in yet another dark corner:

\begin{indentingcode}{0.4in}
small talk
  -> gossip
  -> rumors
  -> balderdash
\end{indentingcode}

Once again, if \agcode{apples} and \agcode{oranges} cannot be
distinguished by the very first character, you have a reduce-reduce
conflict that is somewhat harder to find than the one discussed above,
simply because the trail is somewhat more complicated to follow.

\subsection{Identifying Reduce-Reduce Conflicts}
\index{Reduce-reduce conflicts}\index{Conflicts}

Although the machinery described above for identifying shift-reduce
conflicts can be used to investigate reduce-reduce conflicts, many
yield to comparison of the \agwindow{Reduction States} windows for the
two rules.  For the two rules to be distinguishable, no token which is
acceptable input for any of the reduction states belonging to one rule
can be acceptable input for any reduction state belonging to the
other.

Therefore, when you have a reduce-reduce conflict, there is an overlap
among the tokens in the reduction states for one rule and the tokens
in the reduction states for the other rule.  Often, however, there are
many reduction states which are irrelevant as far as a particular
conflict is concerned.  So AnaGram provides a special window, the
\agwindow{Problem States} window, in the \agmenu{Auxiliary Windows}
menu for the \agwindow{Conflicts} and \agwindow{Resolved Conflicts}
windows.  The \agwindow{Problem States} window for a particular rule
lists those, and only those, reduction states for the rule for which
the conflict token is acceptable input.  This usually reduces the size
of the table and often makes the source of the problem immediately
obvious.

\subsection{Resolving Reduce-Reduce Conflicts}

If you build a parser for a grammar that includes a reduce-reduce
conflict, AnaGram will arbitrarily reduce the first rule it
encountered while reading your grammar, that is, the rule with the
smaller rule number.  In some circumstances this default may be
acceptable.  In other circumstances more may be required.

Often, it is possible to get rid of reduce-reduce conflicts by
rewriting your grammar so that tokens that were once reducing tokens
for a rule become part of the rule itself.  Thus, in the previous
example, you could write:

\begin{indentingcode}{0.4in}
tweedledee
  -> tiddly, wink, apples

tweedledum
  -> tiddly, wink, oranges
\end{indentingcode}

In other words, if \agcode{apples} always follows \agcode{tweedledee},
you can move \agcode{apples} ``down'' in the grammar by making it part
of \agcode{tweedledee}.  Moving tokens ``down'' in this manner always
reduces the likelihood of conflicts.  But what if \agcode{apples}
doesn't always follow \agcode{tweedledee}? Then it is sometimes
worthwhile to make two versions of \agcode{tweedledee}:

\begin{indentingcode}{0.4in}
simple tweedledee
  -> tiddly, wink

complex tweedledee
  -> tiddly, wink, apples
\end{indentingcode}

and in each circumstance use whichever is appropriate.  As with the
earlier shift-reduce example, the technique is to avoid any reduction
until the tokens can be distinguished.

Sometimes, however, reduce-reduce conflicts reflect a language design
that really demands greater lookahead than can be accommodated with an
LR-1 parser.  In these cases you have two options: you can consider a
multi-stage parser or you can implement an ad hoc lookahead procedure.
In the latter case, you can use semantically determined productions to
control your parser depending on the results of your lookahead
procedure.

\subsection{Conflicts due to LALR-1 Simplifications}
\index{LALR simplifications}\index{Conflicts}

LALR-1 \index{Parser generator}parser generators occasionally report a
reduce-reduce conflict where an LR-1 parser would successfully resolve
the conflict.  How significant is this problem? An early prototype
version of AnaGram actually had full LR-1 capability.  Unfortunately
LR-1 parsers are generally much bigger than LALR-1 parsers.  Creating
them takes longer and more resources so that for given computer
resources an LALR-1 parser generator can handle larger, more complex
grammars than an LR-1 parser generator.  To provide a compromise, the
prototype was modified to do an LALR-1 analysis first.  Then, only if
there were reduce-reduce conflicts did it do an LR-1 analysis.  Over a
substantial period of testing, the only reduce-reduce conflicts that
were resolved by the LR-1 analysis were those that were created
deliberately as tests.  Eventually the LR-1 capabilities were removed
from AnaGram since they added complexity without concomitant benefits.
To understand why this is so, it is worthwhile to look at an example.
Let us recall the definitions of \agcode{tweedledee} and
\agcode{tweedledum} made above:

\begin{indentingcode}{0.4in}
tweedledee
  -> tiddly, wink
tweedledum
  -> tiddly, wink
\end{indentingcode}

Remember that we defined gibber thus:

\begin{indentingcode}{0.4in}
gibber
  -> tweedledee, apples
  -> tweedledum, oranges
\end{indentingcode}

Now suppose that we also define jabber in a contrary way:

\begin{indentingcode}{0.4in}
jabber
  -> tweedledee, oranges
  -> tweedledum, apples
\end{indentingcode}

Now suppose we use \agcode{gibber} in one part of our grammar and
\agcode{jabber} in another, completely unrelated, part.  An LR-1
parser generator would succeed in keeping \agcode{tweedledee} and
\agcode{tweedledum} straight in spite of our efforts to confuse it.
An LALR-1 parser generator, however, succumbs to our efforts and
reports a conflict.  A trivial rewrite of the grammar, however,
removes the conflict:

\begin{indentingcode}{0.4in}
gibber tweedledee
  -> tiddly, wink
gibber tweedledum
  -> tiddly, wink

jabber tweedledee
  -> tiddly, wink
jabber tweedledum
  -> tiddly, wink

gibber
  -> gibber tweedledee, apples
  -> gibber tweedledum, oranges

jabber
  -> jabber tweedledee, oranges
  -> jabber tweedledum, apples
\end{indentingcode}

The effect of this rewrite is to accomplish the same separation of
states that an LR-1 parser generator would make.

\section{Keyword Anomalies}
\index{Keyword Anomalies}\index{Anomaly}

In syntax directed parsing, it is assumed that input tokens can be
uniquely identified.  In the case of keywords, however, there is the
possibility that the individual characters making up the keyword, as
well as the keyword taken as a whole, could constitute legitimate
input under some circumstances.  Thus keywords, though a powerful and
useful tool, are not completely consistent with the assumptions that
underlie syntax directed parsing.  This can occasionally give rise to a
type of conflict, diagnosed by AnaGram, called a \agterm{keyword
anomaly}.  AnaGram is quite conservative in its diagnoses, so that many
keyword anomalies it reports are actually innocuous and can be safely
ignored.  Nevertheless, anomalies should be investigated carefully.

AnaGram only recognizes a keyword in those states where the keyword is
allowable input.  If AnaGram recognizes a keyword, the keyword takes
precedence over the sequence of characters that constitute the
keyword.  If there are several allowable keywords, the longest takes
precedence.

Under certain circumstances it is possible for a grammar to have a
state where a particular keyword may or may not be allowable input
depending on how the parser gets to that state.  That is, for some
sequences of tokens which lead to the state, the keyword is allowable,
in others it isn't.  In such a state, the keyword must always cause a
reduction, since if the keyword were to shift, it would always be
allowable input.  When AnaGram finds such a state in your parser, it
checks to see what the parser action would be if the parser ignored
the keyword and interpreted it in terms of its constituent
characters.  If the parser action would be anything other than reducing
the same rule as the keyword or a syntax error, AnaGram will diagnose
a keyword anomaly in this state.

To summarize, the nature of a keyword anomaly is this: It is possible
for your parser to get to the anomalous state by a route for which the
keyword is not allowable.  In the anomalous state, at least the first
constituent character of the keyword is allowable in its own right and
causes a parser action different from that caused by the
keyword.  Nonetheless, if the keyword should appear in the input, your
parser will recognize it, perform at least one reduction action, get
to a state where the keyword is not recognized, and finally, in the
wrong state, will try to interpret the keyword in terms of its
constituent characters.

Now, what do you do about a keyword anomaly? If in the anomalous state
the characters which constitute the keyword would never be
syntactically admissible as individual characters, the keyword anomaly
is innocuous in that the only problem is that the recognition of the
syntax error is somewhat delayed.  If however, the characters which
constitute the keyword are syntactically admissible, you have a
problem that needs to be fixed.

In many programming languages, many keywords are treated as
\agterm{reserved words}, that is, as words that may not be otherwise
used in a conforming program.  Version 1.5 of AnaGram has added a
\index{Reserve keywords}\agparam{reserve keywords} statement that you
can use to specify that the text of a keyword is \emph{never}
admissible as individual characters.  Thus you will never get a
keyword anomaly diagnostic for a keyword that has been listed in a
\index{Reserve keywords}\agparam{reserve keywords} statement.
The \agparam{reserve keywords} statement is described in greater
detail in \S 8.2.28.
% XXX this is the only place in the manual (so far?) that uses the
% \S section-symbol. (Also, this should be a tex crossreference.)

\subsection{The Keyword Anomalies Window}
\index{Keyword Anomalies}\index{Window}

If your grammar has a keyword anomaly, AnaGram will provide a warning
message and will create a \agwindow{Keyword Anomalies} window to
provide details about the anomalies in your grammar.

Each entry in the \agwindow{Keyword Anomalies} window consists of two
lines.  The first line identifies the state at which the anomaly
occurs and the offending keyword.  The second line identifies the rule
which the keyword may erroneously reduce.  The \agmenu{Auxiliary
Windows} menu provides three auxiliary windows keyed directly to the
anomaly to help you determine the nature of the problem: the
\agwindow{Keyword Anomaly Trace} window, the \agwindow{Reduction
Trace} window, and the \agwindow{Rule Derivation} window.  In
addition, three other windows provide supporting information: the
\agwindow{Reduction States} window, the \agwindow{Rule Context} window
and the \agwindow{State Definition} window.  These windows are
discussed in Chapter 6.

The \agwindow{Keyword Anomaly Trace} window, a pre-built
\agwindow{Grammar Trace}, shows a sequence of tokens that illustrates
one way to encounter the anomaly.  The \agwindow{Reduction Trace}
window shows the result of performing the reduction action.  Just as
for conflicts, the true source of the problem is often at some remove
from the actual state in which the problem manifests itself.  One of
the characteristic rules for the last state in the \agwindow{Reduction
Trace} window is, in fact, the true source of the problem.  The
\agwindow{Rule Derivation} window can show you which of the
characteristic rules is the offending party and how the rule in the
anomalous state derives from it.

\subsection{Dealing with Keyword Anomalies}

If your grammar has a keyword anomaly, you should study it carefully.
It may be that the anomaly is completely innocuous, that is, the
characters which constitute the keyword are not otherwise legitimate
input.  In this case you can simply ignore the message.

The basic approach to dealing with keyword anomalies is to revise your
grammar to make a better distinction between the contexts in which the
keyword is acceptable and those in which it may be confused with
otherwise valid input.  Several other techniques may be useful:

For an anomaly to occur, there must be, \emph{in the anomaly state},
an alternative interpretation of some of the initial characters of the
keyword.  If the anomaly occurs because you have several keywords that
begin in a similar manner, you may use the
\index{Distinguish keywords}\index{Keywords}\agparam{distinguish keywords}
statement (see Chapter 8) to keep them properly separated.  Note that
having several keywords that begin with the same characters does
\emph{not} necessarily cause an anomaly.

If the keyword is an alphabetic keyword that can follow conventional
variable name tokens, you may wish to use a \agparam{sticky}
declaration or the
\index{Distinguish lexemes}\index{Configuration switch}
\agparam{distinguish lexemes} switch
to inhibit recognition of the keyword when it is preceded immediately
by a name.

If the keyword can be treated as a reserved word, the \index{Reserve
keywords}\agparam{reserve keywords} statement will eliminate the
anomaly.
author	David A. Holland
date	Sat, 22 Dec 2007 17:52:45 -0500
parents
children