AnaGram interim repo (temporary): doc/manual/xg-iii.tex comparison

comparison doc/manual/xg-iii.tex @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.

author	David A. Holland
date	Sat, 22 Dec 2007 17:52:45 -0500
parents
children

comparison

equal deleted inserted replaced

--1:000000000000
+:13d2b8934445
+\chapter{Exploring Your Grammar III: Diagnostic Windows}
+If AnaGram finds problems with your grammar, it creates a number of
+windows to help you identify and correct the problems.  These windows
+are the \agwindow{Warnings} window, the \agwindow{Conflicts} window,
+the \agwindow{Resolved Conflicts} window, and the \agwindow{Keyword
+Anomalies} window.  To expand upon these windows, AnaGram also
+provides, using the \agmenu{Auxiliary Windows} menu, the following
+supporting windows: \agwindow{Conflict Trace}, \agwindow{Keyword
+Anomaly Trace}, \agwindow{Reduction Trace}, \agwindow{Rule
+Derivation}, \agwindow{Token Derivation} and \agwindow{Problem
+States}.  This chapter describes these windows and their use.
+\section{The Warnings Window}
+\index{Warnings}\index{Window}
+If while analyzing your syntax file, AnaGram finds something
+suspicious, it issues a warning, regardless of the severity of the
+problem.  The \agwindow{Warnings} window will pop up automatically
+when the analysis has been completed.  If the warning is for a syntax
+error in your input file, you will have to fix it, because AnaGram
+cannot successfully interpret your syntax file.  Otherwise, no matter
+how serious the warnings may be, AnaGram will create a parser if it is
+instructed to do so.  Thus, if you decide the warnings are of no
+consequence, you may choose to ignore them.  Note that AnaGram will
+resolve shift-reduce conflicts by choosing the shift action.  It will
+resolve reduce-reduce conflicts by choosing the rule that occurred
+first in the grammar.
+If you have \index{Syntax error}\index{Errors}syntax errors, AnaGram
+will synchronize the cursor in the syntax file window with the cursor
+in the \agwindow{Warnings} window so that as you move through the
+warning messages, the cursor in the syntax file window will move to
+the point of the error.
+AnaGram's warning messages are listed, in alphabetical order, in
+Appendix B, together with explanations of the messages and suggestions
+as to what to do about them.
+\section{The Conflicts Window}
+\index{Conflicts}\index{Window}
+The \agwindow{Conflicts} window lists the conflicts which AnaGram
+found in your grammar.  The table identifies the parser states in
+which it found conflicts.  For each conflict state it identifies the
+reducing token for which it had more than one option, and the marked
+rules which characterize each such option.  If AnaGram was able to
+resolve a conflict by using \agparam{precedence} or \agparam{sticky}
+declarations, the conflict will be listed in the \index{Resolved
+Conflicts}\index{Window}\agwindow{Resolved Conflicts} window instead.
+The \agwindow{Conflicts} window and the \agwindow{Resolved Conflicts}
+window each require several lines to display a conflict.  For each
+conflict, the first, or header, line identifies the conflict state and
+the conflict token.  The conflict token is a terminal token for which
+there is more than one parser action consistent with the rules of the
+grammar.  If both shift and reduce actions are consistent with the
+grammar, the conflict is a \agterm{shift-reduce} conflict.  If more
+than one reduce action is consistent with the grammar, the conflict is
+said to be a \agterm{reduce-reduce} conflict.
+Following the header line, AnaGram lists the grammar rules which the
+conflict token could satisfy.  The rules which correspond to shift
+actions are listed first, followed by those which correspond to reduce
+actions.  Rules which correspond to shift actions have not yet been
+completed and have a marked token, shown in a distinctive type face,
+which represents the next token expected.  For such rules, the
+conflict token is an instance of the marked token and can be shifted
+into the input buffer.  If the rule has been completely matched there
+is no marked token and the conflict token can reduce the rule.  In the
+following discussion, such a rule is referred to as a \agterm{conflict
+rule}.  If there is a more than one conflict rule, the conflict is
+called a \agterm{reduce-reduce} conflict; otherwise it is a
+\agterm{shift-reduce} conflict.  Although it is theoretically possible
+for a conflict to be both a shift-reduce conflict and a reduce-reduce
+conflict, it is not a common event.
+The \agmenu{Auxiliary Windows} menu for the \agwindow{Conflicts} window
+provides eleven auxiliary windows to help you understand the
+circumstances of a conflict.  Five of these windows address the
+conflict directly.  These windows are the \agwindow{Conflict Trace}, the
+\agwindow{Reduction Trace}, the \agwindow{Rule Derivation}, the
+\agwindow{Token Derivation}, and the \agwindow{Problem States}
+windows.  These windows are keyed to the conflict state, the conflict
+token, and the conflict rule.  If the cursor bar is not highlighting a
+conflict rule, AnaGram scans down the \agwindow{Conflicts} window for
+the first conflict rule for the selected state and token.  The use of
+these windows in analyzing conflicts is described below.
+The other windows in the \agmenu{Auxiliary Windows} menu have been
+described in Chapter 6.  These are the \agwindow{Expansion Chain}
+window, the \agwindow{Rule Context} window, the \agwindow{Reduction
+States} window, the \agwindow{State Definition} window, the
+\agwindow{State Expansion} window and the \agwindow{Token Usage}
+window.  The \agwindow{Rule Context} window is keyed to the particular
+rule highlighted by the cursor.  The
+\index{Expansion Chain}\index{Window}\agwindow{Expansion Chain} window
+is keyed to the highlighted rule.  If the highlighted rule is a
+characteristic rule no expansion chain is possible, so the option will
+be greyed out.  The \agwindow{Reduction States} window is keyed to the
+highlighted rule if it is a conflict rule, otherwise to the first
+conflict rule following the highlighted line.  The \agwindow{State
+Definition} and \agwindow{State Expansion} windows are keyed to the
+conflict state.  The \agwindow{Token Usage} window is keyed to the
+conflict token.
+\section{The Resolved Conflicts Window}
+\index{Resolved Conflicts}\index{Window}
+AnaGram creates the \agwindow{Resolved Conflicts} window only when the
+grammar it is analyzing has conflicts and when some of those conflicts
+have been resolved by precedence rules (see Chapter 9) or by use of
+the \agparam{sticky} directive.  In this case, it will list, in the
+\agwindow{Resolved Conflicts} window, the conflicts that were resolved
+and the way in which they were resolved.  The layout of the
+\agwindow{Resolved Conflicts} window is identical to that of the
+\agwindow{Conflicts} window with one exception.  The rules which your
+parser will observe, selected in accordance with your \agparam{sticky}
+declarations or precedence rules, are marked with asterisks.  Those
+which your parser will ignore will not be so marked.  The
+\agwindow{Auxiliary Windows} available from the \agwindow{Resolved
+Conflicts} window are the same as those available from the
+\agwindow{Conflicts} window.
+\section{Conflicts}
+\index{Conflicts}
+Chapter 4 presented a brief introduction to the problem of conflicts
+in a grammar.  There, conflicts were simply classified as shift-reduce
+conflicts or as reduce-reduce conflicts.  This section examines
+conflicts in somewhat greater detail.  It looks at the causes of
+conflicts, how to identify the cause of a conflict and how to deal
+with it.
+Conflicts occur when, in a particular state, a given token, the
+\index{Conflict token}\index{Token}\agterm{conflict token}, reduces
+more than one rule, or reduces a rule when the token could also be
+shifted into the input buffer in accordance with another rule.
+Note that there are no shift-shift conflicts, that is, a particular
+state could have several
+\index{Characteristic rules}\index{Rules}characteristic rules
+accepting the same token as a shift token.  A conflict is always
+associated with reduction.  This is because LALR parsers keep track of
+multiple possibilities in the input stream as long as the input tokens
+only cause shifts.  At the point where a reduction occurs the parser
+must be able to identify exactly one of the possibilities as
+consistent with the input.
+\subsection{Shift-Reduce Conflicts}
+\index{Shift-reduce conflict}\index{Conflicts}
+Unless a conflict is deliberate, the fact that the conflict token
+reduces a particular rule is usually a surprise.  Although some
+conflicts arise because an LALR-1 parser generator cannot look far
+enough ahead, most shift-reduce conflicts result from true
+\index{Ambiguities}ambiguities in your grammar.  For example, an
+erroneous Pascal grammar contained the following productions, which
+illustrate one of the more common classes of shift-reduce conflicts:
+\begin{indentingcode}{0.4in}
+parameter list
+-> expression
+-> parameter list, expression
+\end{indentingcode}
+Pascal actually requires that the expressions in a parameter list be
+separated by commas.  In other words, the second rule should read:
+\begin{indentingcode}{0.4in}
+-> parameter list, comma, expression
+\end{indentingcode}
+Lacking a requirement for a comma, the erroneous rules specify that a
+parameter list is simply a sequence of expressions with nothing
+separating them.  For example, using this definition,
+\begin{indentingcode}{0.4in}
+(a + b) - (c * d)
+\end{indentingcode}
+can be interpreted either as one expression, if the minus sign is
+interpreted as a subtraction operator, or as a sequence of two
+expressions if it is interpreted as a unary minus sign.
+When AnaGram analyzed the grammar with the erroneous rule it found a
+conflict in the state where the minus sign was expected.  If the minus
+sign is the subtraction operator, it should be shifted onto the input
+stack.  On the other hand, if the minus sign is a unary minus sign,
+the sum \agcode{(a+b)} should be reduced to \agcode{expression}, and
+then to \agcode{parameter list}.  Thus, this ambiguity gives rise to a
+shift-reduce conflict, where it is impossible to determine a unique
+parser action.
+\subsection{Identifying Ambiguities}
+\index{Ambiguities}
+Unfortunately, shift-reduce conflicts usually show up in parser states
+which are far removed from the true source of the problem.  When your
+grammar has a shift-reduce conflict of the sort illustrated above,
+there may be no readily apparent link to the erroneous
+production.  AnaGram provides several options to help you establish the
+link, all available using the \agmenu{Auxiliary Windows} menu from the
+\agwindow{Conflicts} window.  These options comprise two pre-built
+grammar traces, the \agwindow{Conflict Trace} and the
+\agwindow{Reduction Trace}; two special windows called \agwindow{Rule
+Derivation} and \agwindow{Token Derivation}; the \agwindow{Expansion
+Chain} window and a special version of the \agwindow{Reduction States}
+window called the \agwindow{Problem States} window.  The four trace
+and derivation windows are coordinated, so that AnaGram actually
+computes a mutually consistent set of four whenever you ask for one of
+them.
+The \agwindow{Conflict} and \agwindow{Reduction Traces} are simply
+pre-initialized \agwindow{Grammar Traces} which you may use to become
+more familiar with the circumstances of the conflict.  Bear in mind,
+however, that unless you have set the
+\index{Traditional engine}\index{Configuration switches}
+\agparam{traditional engine} configuration switch,
+the traces will use compound actions, which in some cases will use the
+default resolution of conflicts.  Therefore, unless you set
+\agparam{traditional engine} it is not always possible to work through
+an alternate resolution of a conflict.
+The
+\index{Conflict Trace}\index{Trace}\index{Window}\agwindow{Conflict Trace}
+shows you one possible sequence of input which will lead to the
+conflict state.  Unless a conflict results from a typo or other
+clerical error, it is usually a surprise and not something you
+anticipated.  The \agwindow{Conflict Trace} helps you to understand
+the context of the problem.  It is often helpful to look at the
+\agwindow{Rule Stack} for the \agwindow{Conflict Trace} to see what
+productions are involved in the conflict.  You can inspect the
+\agwindow{State Definition} window for the conflict state to see why
+the conflict token can be shifted.
+\index{Reduction Trace}\index{Trace}\index{Window}
+The \agwindow{Reduction Trace}, for a particular conflict rule,
+carries forward a \agwindow{Conflict Trace} by selecting the reduce
+action.  In other words, for a parser in the conflict state, the
+\agwindow{Reduction Trace} shows the result of using the conflict
+token as a reducing token.  The trace shows the result of taking all
+possible reductions until a state is reached where the lookahead token
+will shift into the input buffer.  If you compare the
+\agwindow{Reduction Trace} to the \agwindow{Conflict Trace} you will
+notice that they start off with identical tokens.  At some point they
+diverge.  All of the tokens in the \agwindow{Conflict Trace} following
+this point have reduced to the token in the \agwindow{Reduction
+Trace}.  Usually there are no further tokens in the
+\agwindow{Reduction Trace}, but if there are, you will notice that
+they are all zero-length tokens.  That is, they all have a null
+production, so that they come for free, as it were.  If you press
+Enter, to display the \agwindow{Select Input} window for the
+\agwindow{Reduction Trace}, you will see that the conflict token is
+acceptable input.
+The \agwindow{Reduction Trace} illustrates the ambiguity in your
+grammar that caused the conflict: One of the acceptable input tokens
+in the final state of this trace cannot be distinguished from its
+predecessor.
+The \index{Expansion Chain}\index{Window}\agwindow{Expansion Chain}
+window is extremely useful for indicating why a particular grammar
+rule is an expansion rule in a particular state.  Sometimes the chain
+of substitutions to produce a given rule is quite long, and changes to
+a grammar can have unexpected effects.  The \agwindow{Expansion Chain}
+window enables you to see how a rule in a conflict state got to be
+there.
+\index{Problem States}\index{Window}The \agwindow{Problem States}
+window for a conflict rule is essentially a trimmed
+\agwindow{Reduction States} window which shows only those reduction
+states in which the conflict token is acceptable input.  Often it is
+sufficient to look at the \agwindow{Problem States} window to see the
+source of the problem.  The last state in the \agwindow{Reduction
+Trace} window, will be, of course, one of the states in the
+\agwindow{Problem States} window.
+\subsection{Rule and Token Derivation Windows}
+If after inspecting the \agwindow{Problem States}, \agwindow{Conflict
+Trace} and the \agwindow{Reduction Trace}, the conflict is still a
+mystery, you may pinpoint the problem by looking at the
+\agwindow{Rule} and \agwindow{Token Derivation} windows for the
+conflict.  Both windows contain a sequence of rules, and both begin
+with the same rule, the rule which is the root cause of the conflict.
+This rule is one of the characteristic rules of the last state in the
+\agwindow{Reduction Trace} and is the rule which produces the ambiguity.
+\index{Rule Derivation}\index{Window}In both windows, each rule,
+except the last line of the \agwindow{Rule Derivation}, contains a
+marked token in a distinctive typeface.  The essence of the ambiguity
+is that it is impossible to tell where in the input stream the marked
+token in the top line of the \agwindow{Rule Derivation} ends and the
+marked token in the top line of the \agwindow{Token Derivation}
+begins.
+In both derivation windows, each line after the first consists of a
+rule produced by the marked token on the previous line.  In the
+\agwindow{Rule Derivation}, except on the first line, the marked token
+is either the last token in the rule, or all subsequent tokens have
+null productions.  The last rule in the derivation window is the
+conflict rule you selected in the \agwindow{Conflicts} window.  Thus
+the \agwindow{Rule Derivation} window shows you how the rule involved
+in the conflict derives from the root.
+\index{Window}\index{Token Derivation}In the \agwindow{Token
+Derivation} window, except on the first line, the marked token is
+either the first token in the rule, or all preceding tokens have null
+productions.  The marked token in the last line of the list is the
+conflict token.
+The \agwindow{Rule Derivation} and \agwindow{Token Derivation} windows
+each have five auxiliary windows.  The \agwindow{Rule Context} window
+is keyed simply to the highlighted rule.  The other four windows, the
+\agwindow{Expansion Rules} window, the \agwindow{Productions} window,
+the \agwindow{Set Elements} window, and the \agwindow{Token Usage}
+window are keyed to the marked token. Remember that there is no marked
+token on the last line of the \agwindow{Rule Derivation} window.
+\subsection{Resolving Ambiguities}
+\index{Ambiguities}
+The shift-reduce conflict discussed earlier stemmed from an oversight
+in writing a grammar: the grammar specified a list of items, in this
+case expressions, which could not be distinguished without a separator
+of some sort.  This conflict can be resolved simply by rewriting the
+grammar to require a comma separating the expressions.
+A similar problem comes from specifying a list of lists without any
+separator.  Sometimes such a specification is made inadvertently.  The
+erroneous Pascal grammar cited above had such a conflict.  Here are
+the productions that caused trouble:
+\begin{indentingcode}{0.4in}
+declaration part
+-> declaration section
+-> declaration part, declaration section
+declaration section
+-> proc and func declaration part
+proc and func declaration part
+-> proc or func declaration
+-> proc and func declaration part, proc or func declaration
+proc or func declaration
+-> proc declaration
+-> func declaration
+\end{indentingcode}
+Of course, there are other productions for \agcode{declaration
+section} so that this grammar looks perfectly reasonable on the
+surface.  A \agcode{declaration part} is a sequence of
+\agcode{declaration sections}.  One type of \agcode{declaration
+section}, the \agcode{proc and func declaration part}, is a sequence
+of procedure or function definitions.  Unfortunately, if you have two
+procedure definitions in a row, it could be interpreted as one
+\agcode{proc and func declaration part} consisting of two declarations
+or two \agcode{proc and func declaration parts} each consisting of one
+declaration.  The problem here is that there is a little too much
+recursion.  You can fix this problem by replacing
+\begin{indentingcode}{0.4in}
+declaration section
+-> proc and func declaration part
+proc and func declaration part
+-> proc or func declaration
+-> proc and func declaration part, proc or func declaration
+\end{indentingcode}
+with
+\begin{indentingcode}{0.4in}
+declaration section
+-> proc or func declaration
+\end{indentingcode}
+After a little reflection you will see that this simplification, which
+eliminates one source of recursion, entails no loss of generality.
+In the above examples, the ambiguities were the results of errors in
+the specification of the grammar, and could be easily fixed by
+correcting the grammar.  Sometimes, however, ambiguities arise which
+are not so easily corrected.  Consider, for instance, a simple token
+scanner:
+\begin{indentingcode}{0.4in}
+tokens
+-> [token, white space?...]...
+token
+-> name
+-> number
+-> string
+-> character constant
+-> punctuation character
+\end{indentingcode}
+where \agcode{name}, \agcode{number}, etc., are defined in the
+conventional manner.
+Such token scanners are normally written using conventional
+programming techniques, or using programs such as \agfile{lex} which
+are based on the analysis of regular expressions.  Such programs
+simply work on a character by character basis and do not see that
+``maryjane'' could be interpreted as one name or as two, or as even
+more, since the separating white space is declared to be optional.
+AnaGram, however, since it takes a broader view, sees that the
+specifications are basically ambiguous and complains.  Note, however,
+that the default resolution of conflicts, in this case choosing the
+shift action, will provide a parser that does precisely what you would
+normally intend.  Thus, the major concern is to make sure that the
+conflicts caused by this situation do not mask other more serious
+conditions.
+While it is possible to rewrite this grammar so that it is not
+ambiguous, such a grammar would lack clarity and conciseness.  AnaGram
+provides two alternatives for dealing with this problem: the
+\index{Declaration}\index{Subgrammar declaration}\agparam{subgrammar}
+declaration and the
+\index{Sticky declaration}\index{Declaration}\agparam{sticky}
+declaration, which accomplish the desired end using very different
+techniques.
+In the definition above, a grammar for token, taken all by itself,
+would not normally show any ambiguities.  We could write a
+\index{Lexical scanner}lexical scanner that would identify a token,
+pass it on to the parser, and then go on to the next.  We would never
+worry about any ambiguities.  In this situation, it is clear that what
+we mean by \agcode{token} is defined entirely by the grammar for
+\agcode{token}, not by its usage within the rest of the grammar.  When
+we say that we have a sequence of tokens, \agcode{token...}, we
+usually mean that, of course, two tokens, such as ``mary'' and
+``jane'' have to be separated by punctuation or space to be
+distinguished as separate tokens.  The
+\index{Declaration}\index{Subgrammar declaration}\agparam{subgrammar}
+declaration allows you to specify that a particular \index{Nonterminal
+token}\index{Token}nonterminal token in your grammar is to be treated
+as though it were defined by its own complete grammar, without regard
+to its usage in the grander scheme of things.  In the example above,
+add the following:
+\begin{indentingcode}{0.4in}
+[ subgrammar \bra token \ket ]
+\end{indentingcode}
+Now, when AnaGram conducts its search for reducing tokens, it will
+limit its search to those it would find if your grammar consisted only
+of the grammar rules necessary to define token.  When it does this, it
+never even finds the reducing tokens that appear to cause conflicts.
+When you fix a conflict with the \agparam{subgrammar} declaration, the
+conflict vanishes entirely.  It is not recorded in the
+\agwindow{Resolved Conflicts} window.  For this reason, the
+\agparam{subgrammar} declaration should be used only when you are
+absolutely sure of the nature of the problem you are trying to solve.
+A somewhat more precise mechanism for dealing with the conflicts
+described above is to declare name and number as \index{Sticky
+declaration}\index{Declaration}\agparam{sticky}:
+\begin{indentingcode}{0.4in}
+[ sticky \bra name, number \ket ]
+\end{indentingcode}
+This means that whenever the last token in your parser's input buffer
+is \agcode{name} or \agcode{number}, it will shift in any acceptable
+input character, and will disregard conflicting reduce options.
+AnaGram will continue to identify conflicts in your grammar due to
+this cause, but will record them in the \agwindow{Resolved Conflicts}
+window.  Any other conflicts you may have will still show up in the
+\agwindow{Conflicts} window.
+Sometimes it may be desirable to introduce ambiguities deliberately
+into a grammar.  For example, although it is easy enough to write
+normal grammar rules for the conventional expression logic used in
+programming languages, it is possible to produce a somewhat more
+compact, faster running parser by writing an ambiguous grammar and
+using AnaGram's precedence declarations to resolve the ambiguities.
+An example of this technique is given in Chapter 9.
+\subsection{Null Productions}
+\index{Null productions}\index{Production}Null productions can be a
+fertile source of conflicts.  Fortunately, many of them can be easily
+resolved by simply rewriting a few rules.  Consider, for example, the
+following:
+\begin{indentingcode}{0.4in}
+name
+-> capital letter, letter..., white space
+initial
+-> capital letter, period, white space
+full name
+-> name, initial?, name
+\end{indentingcode}
+In this example, \agcode{initial?} is a virtual production which
+AnaGram replaces with
+% not ``with''...
+\begin{indentingcode}{0.4in}
+initial?
+->
+-> initial
+\end{indentingcode}
+so \agcode{initial?} involves a null production.
+Recall that LALR-1 parsers cannot look ahead more than one character.
+The problem with this grammar, then, is that given the input ``John
+Q'', there is no way to tell whether the ``Q'' is an initial or the
+first letter of the last name.  Is this ``John Q. Public'' or ``John
+Quincy''? Adding another production for ``full name'' so that
+``initial'' need not be marked as optional allows the parser to defer
+its decision:
+\begin{indentingcode}{0.4in}
+full name
+-> name, name
+-> name, initial, name
+\end{indentingcode}
+Notice that the technique here is to \emph{avoid a reduction} until
+the parser has enough information to proceed.
+\subsection{Reduce-Reduce Conflicts}
+\index{Reduce-reduce conflicts}\index{Conflicts}Reduce-reduce
+conflicts are usually caused by the limited lookahead of LR-1 parsing.
+Very occasionally, you might find a grammar with a reduce-reduce
+conflict that is caused by the LALR-1 approximation to LR-1 parsing
+that AnaGram supports.  Such conflicts are quite easy to resolve,
+however, by almost trivial rewriting.
+Reduce-reduce conflicts commonly involve productions that are
+intrinsically indistinguishable, for instance:
+\begin{indentingcode}{0.4in}
+tweedledee
+-> tiddly, wink
+tweedledum
+-> tiddly, wink
+\end{indentingcode}
+However, it may be possible to distinguish \agcode{tweedledee} from
+\agcode{tweedledum} by context.  Suppose
+\begin{indentingcode}{0.4in}
+gibber
+-> tweedledee, apples
+-> tweedledum, oranges
+\end{indentingcode}
+Now if you can distinguish \agcode{apples} from \agcode{oranges} on
+the basis of the very first character, your parser will be able to
+tell \agcode{tweedledee} from \agcode{tweedledum}.  If, on the other
+hand, it takes more than one input character to tell whether you have
+\agcode{apples} or \agcode{oranges}, you have a reduce-reduce conflict.
+This example, of course, is rather far-fetched, since the potential
+confusion between \agcode{tweedledee} and \agcode{tweedledum} is
+obvious.  More commonly, the definitions and usage of
+\agcode{tweedledee} and \agcode{tweedledum} are physically far apart
+in the grammar.  At one place in a grammar:
+\begin{indentingcode}{0.4in}
+balderdash
+-> tweedledee, apples
+\end{indentingcode}
+Elsewhere, seemingly unrelated:
+\begin{indentingcode}{0.4in}
+buncombe
+-> tweedledum, oranges
+\end{indentingcode}
+In a dark corner:
+\begin{indentingcode}{0.4in}
+gossip
+-> buncombe, slander
+\end{indentingcode}
+And in yet another dark corner:
+\begin{indentingcode}{0.4in}
+small talk
+-> gossip
+-> rumors
+-> balderdash
+\end{indentingcode}
+Once again, if \agcode{apples} and \agcode{oranges} cannot be
+distinguished by the very first character, you have a reduce-reduce
+conflict that is somewhat harder to find than the one discussed above,
+simply because the trail is somewhat more complicated to follow.
+\subsection{Identifying Reduce-Reduce Conflicts}
+\index{Reduce-reduce conflicts}\index{Conflicts}
+Although the machinery described above for identifying shift-reduce
+conflicts can be used to investigate reduce-reduce conflicts, many
+yield to comparison of the \agwindow{Reduction States} windows for the
+two rules.  For the two rules to be distinguishable, no token which is
+acceptable input for any of the reduction states belonging to one rule
+can be acceptable input for any reduction state belonging to the
+other.
+Therefore, when you have a reduce-reduce conflict, there is an overlap
+among the tokens in the reduction states for one rule and the tokens
+in the reduction states for the other rule.  Often, however, there are
+many reduction states which are irrelevant as far as a particular
+conflict is concerned.  So AnaGram provides a special window, the
+\agwindow{Problem States} window, in the \agmenu{Auxiliary Windows}
+menu for the \agwindow{Conflicts} and \agwindow{Resolved Conflicts}
+windows.  The \agwindow{Problem States} window for a particular rule
+lists those, and only those, reduction states for the rule for which
+the conflict token is acceptable input.  This usually reduces the size
+of the table and often makes the source of the problem immediately
+obvious.
+\subsection{Resolving Reduce-Reduce Conflicts}
+If you build a parser for a grammar that includes a reduce-reduce
+conflict, AnaGram will arbitrarily reduce the first rule it
+encountered while reading your grammar, that is, the rule with the
+smaller rule number.  In some circumstances this default may be
+acceptable.  In other circumstances more may be required.
+Often, it is possible to get rid of reduce-reduce conflicts by
+rewriting your grammar so that tokens that were once reducing tokens
+for a rule become part of the rule itself.  Thus, in the previous
+example, you could write:
+\begin{indentingcode}{0.4in}
+tweedledee
+-> tiddly, wink, apples
+tweedledum
+-> tiddly, wink, oranges
+\end{indentingcode}
+In other words, if \agcode{apples} always follows \agcode{tweedledee},
+you can move \agcode{apples} ``down'' in the grammar by making it part
+of \agcode{tweedledee}.  Moving tokens ``down'' in this manner always
+reduces the likelihood of conflicts.  But what if \agcode{apples}
+doesn't always follow \agcode{tweedledee}? Then it is sometimes
+worthwhile to make two versions of \agcode{tweedledee}:
+\begin{indentingcode}{0.4in}
+simple tweedledee
+-> tiddly, wink
+complex tweedledee
+-> tiddly, wink, apples
+\end{indentingcode}
+and in each circumstance use whichever is appropriate.  As with the
+earlier shift-reduce example, the technique is to avoid any reduction
+until the tokens can be distinguished.
+Sometimes, however, reduce-reduce conflicts reflect a language design
+that really demands greater lookahead than can be accommodated with an
+LR-1 parser.  In these cases you have two options: you can consider a
+multi-stage parser or you can implement an ad hoc lookahead procedure.
+In the latter case, you can use semantically determined productions to
+control your parser depending on the results of your lookahead
+procedure.
+\subsection{Conflicts due to LALR-1 Simplifications}
+\index{LALR simplifications}\index{Conflicts}
+LALR-1 \index{Parser generator}parser generators occasionally report a
+reduce-reduce conflict where an LR-1 parser would successfully resolve
+the conflict.  How significant is this problem? An early prototype
+version of AnaGram actually had full LR-1 capability.  Unfortunately
+LR-1 parsers are generally much bigger than LALR-1 parsers.  Creating
+them takes longer and more resources so that for given computer
+resources an LALR-1 parser generator can handle larger, more complex
+grammars than an LR-1 parser generator.  To provide a compromise, the
+prototype was modified to do an LALR-1 analysis first.  Then, only if
+there were reduce-reduce conflicts did it do an LR-1 analysis.  Over a
+substantial period of testing, the only reduce-reduce conflicts that
+were resolved by the LR-1 analysis were those that were created
+deliberately as tests.  Eventually the LR-1 capabilities were removed
+from AnaGram since they added complexity without concomitant benefits.
+To understand why this is so, it is worthwhile to look at an example.
+Let us recall the definitions of \agcode{tweedledee} and
+\agcode{tweedledum} made above:
+\begin{indentingcode}{0.4in}
+tweedledee
+-> tiddly, wink
+tweedledum
+-> tiddly, wink
+\end{indentingcode}
+Remember that we defined gibber thus:
+\begin{indentingcode}{0.4in}
+gibber
+-> tweedledee, apples
+-> tweedledum, oranges
+\end{indentingcode}
+Now suppose that we also define jabber in a contrary way:
+\begin{indentingcode}{0.4in}
+jabber
+-> tweedledee, oranges
+-> tweedledum, apples
+\end{indentingcode}
+Now suppose we use \agcode{gibber} in one part of our grammar and
+\agcode{jabber} in another, completely unrelated, part.  An LR-1
+parser generator would succeed in keeping \agcode{tweedledee} and
+\agcode{tweedledum} straight in spite of our efforts to confuse it.
+An LALR-1 parser generator, however, succumbs to our efforts and
+reports a conflict.  A trivial rewrite of the grammar, however,
+removes the conflict:
+\begin{indentingcode}{0.4in}
+gibber tweedledee
+-> tiddly, wink
+gibber tweedledum
+-> tiddly, wink
+jabber tweedledee
+-> tiddly, wink
+jabber tweedledum
+-> tiddly, wink
+gibber
+-> gibber tweedledee, apples
+-> gibber tweedledum, oranges
+jabber
+-> jabber tweedledee, oranges
+-> jabber tweedledum, apples
+\end{indentingcode}
+The effect of this rewrite is to accomplish the same separation of
+states that an LR-1 parser generator would make.
+\section{Keyword Anomalies}
+\index{Keyword Anomalies}\index{Anomaly}
+In syntax directed parsing, it is assumed that input tokens can be
+uniquely identified.  In the case of keywords, however, there is the
+possibility that the individual characters making up the keyword, as
+well as the keyword taken as a whole, could constitute legitimate
+input under some circumstances.  Thus keywords, though a powerful and
+useful tool, are not completely consistent with the assumptions that
+underlie syntax directed parsing.  This can occasionally give rise to a
+type of conflict, diagnosed by AnaGram, called a \agterm{keyword
+anomaly}.  AnaGram is quite conservative in its diagnoses, so that many
+keyword anomalies it reports are actually innocuous and can be safely
+ignored.  Nevertheless, anomalies should be investigated carefully.
+AnaGram only recognizes a keyword in those states where the keyword is
+allowable input.  If AnaGram recognizes a keyword, the keyword takes
+precedence over the sequence of characters that constitute the
+keyword.  If there are several allowable keywords, the longest takes
+precedence.
+Under certain circumstances it is possible for a grammar to have a
+state where a particular keyword may or may not be allowable input
+depending on how the parser gets to that state.  That is, for some
+sequences of tokens which lead to the state, the keyword is allowable,
+in others it isn't.  In such a state, the keyword must always cause a
+reduction, since if the keyword were to shift, it would always be
+allowable input.  When AnaGram finds such a state in your parser, it
+checks to see what the parser action would be if the parser ignored
+the keyword and interpreted it in terms of its constituent
+characters.  If the parser action would be anything other than reducing
+the same rule as the keyword or a syntax error, AnaGram will diagnose
+a keyword anomaly in this state.
+To summarize, the nature of a keyword anomaly is this: It is possible
+for your parser to get to the anomalous state by a route for which the
+keyword is not allowable.  In the anomalous state, at least the first
+constituent character of the keyword is allowable in its own right and
+causes a parser action different from that caused by the
+keyword.  Nonetheless, if the keyword should appear in the input, your
+parser will recognize it, perform at least one reduction action, get
+to a state where the keyword is not recognized, and finally, in the
+wrong state, will try to interpret the keyword in terms of its
+constituent characters.
+Now, what do you do about a keyword anomaly? If in the anomalous state
+the characters which constitute the keyword would never be
+syntactically admissible as individual characters, the keyword anomaly
+is innocuous in that the only problem is that the recognition of the
+syntax error is somewhat delayed.  If however, the characters which
+constitute the keyword are syntactically admissible, you have a
+problem that needs to be fixed.
+In many programming languages, many keywords are treated as
+\agterm{reserved words}, that is, as words that may not be otherwise
+used in a conforming program.  Version 1.5 of AnaGram has added a
+\index{Reserve keywords}\agparam{reserve keywords} statement that you
+can use to specify that the text of a keyword is \emph{never}
+admissible as individual characters.  Thus you will never get a
+keyword anomaly diagnostic for a keyword that has been listed in a
+\index{Reserve keywords}\agparam{reserve keywords} statement.
+The \agparam{reserve keywords} statement is described in greater
+detail in \S 8.2.28.
+% XXX this is the only place in the manual (so far?) that uses the
+% \S section-symbol. (Also, this should be a tex crossreference.)
+\subsection{The Keyword Anomalies Window}
+\index{Keyword Anomalies}\index{Window}
+If your grammar has a keyword anomaly, AnaGram will provide a warning
+message and will create a \agwindow{Keyword Anomalies} window to
+provide details about the anomalies in your grammar.
+Each entry in the \agwindow{Keyword Anomalies} window consists of two
+lines.  The first line identifies the state at which the anomaly
+occurs and the offending keyword.  The second line identifies the rule
+which the keyword may erroneously reduce.  The \agmenu{Auxiliary
+Windows} menu provides three auxiliary windows keyed directly to the
+anomaly to help you determine the nature of the problem: the
+\agwindow{Keyword Anomaly Trace} window, the \agwindow{Reduction
+Trace} window, and the \agwindow{Rule Derivation} window.  In
+addition, three other windows provide supporting information: the
+\agwindow{Reduction States} window, the \agwindow{Rule Context} window
+and the \agwindow{State Definition} window.  These windows are
+discussed in Chapter 6.
+The \agwindow{Keyword Anomaly Trace} window, a pre-built
+\agwindow{Grammar Trace}, shows a sequence of tokens that illustrates
+one way to encounter the anomaly.  The \agwindow{Reduction Trace}
+window shows the result of performing the reduction action.  Just as
+for conflicts, the true source of the problem is often at some remove
+from the actual state in which the problem manifests itself.  One of
+the characteristic rules for the last state in the \agwindow{Reduction
+Trace} window is, in fact, the true source of the problem.  The
+\agwindow{Rule Derivation} window can show you which of the
+characteristic rules is the offending party and how the rule in the
+anomalous state derives from it.
+\subsection{Dealing with Keyword Anomalies}
+If your grammar has a keyword anomaly, you should study it carefully.
+It may be that the anomaly is completely innocuous, that is, the
+characters which constitute the keyword are not otherwise legitimate
+input.  In this case you can simply ignore the message.
+The basic approach to dealing with keyword anomalies is to revise your
+grammar to make a better distinction between the contexts in which the
+keyword is acceptable and those in which it may be confused with
+otherwise valid input.  Several other techniques may be useful:
+For an anomaly to occur, there must be, \emph{in the anomaly state},
+an alternative interpretation of some of the initial characters of the
+keyword.  If the anomaly occurs because you have several keywords that
+begin in a similar manner, you may use the
+\index{Distinguish keywords}\index{Keywords}\agparam{distinguish keywords}
+statement (see Chapter 8) to keep them properly separated.  Note that
+having several keywords that begin with the same characters does
+\emph{not} necessarily cause an anomaly.
+If the keyword is an alphabetic keyword that can follow conventional
+variable name tokens, you may wish to use a \agparam{sticky}
+declaration or the
+\index{Distinguish lexemes}\index{Configuration switch}
+\agparam{distinguish lexemes} switch
+to inhibit recognition of the keyword when it is preceded immediately
+by a name.
+If the keyword can be treated as a reserved word, the \index{Reserve
+keywords}\agparam{reserve keywords} statement will eliminate the
+anomaly.

Mercurial > ~dholland > hg > ag > index.cgi

comparison doc/manual/xg-iii.tex @ 0:13d2b8934445