Mercurial > ~dholland > hg > ag > index.cgi
diff doc/manual/xg-iii.tex @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/doc/manual/xg-iii.tex Sat Dec 22 17:52:45 2007 -0500 @@ -0,0 +1,884 @@ +\chapter{Exploring Your Grammar III: Diagnostic Windows} + +If AnaGram finds problems with your grammar, it creates a number of +windows to help you identify and correct the problems. These windows +are the \agwindow{Warnings} window, the \agwindow{Conflicts} window, +the \agwindow{Resolved Conflicts} window, and the \agwindow{Keyword +Anomalies} window. To expand upon these windows, AnaGram also +provides, using the \agmenu{Auxiliary Windows} menu, the following +supporting windows: \agwindow{Conflict Trace}, \agwindow{Keyword +Anomaly Trace}, \agwindow{Reduction Trace}, \agwindow{Rule +Derivation}, \agwindow{Token Derivation} and \agwindow{Problem +States}. This chapter describes these windows and their use. + + +\section{The Warnings Window} +\index{Warnings}\index{Window} + +If while analyzing your syntax file, AnaGram finds something +suspicious, it issues a warning, regardless of the severity of the +problem. The \agwindow{Warnings} window will pop up automatically +when the analysis has been completed. If the warning is for a syntax +error in your input file, you will have to fix it, because AnaGram +cannot successfully interpret your syntax file. Otherwise, no matter +how serious the warnings may be, AnaGram will create a parser if it is +instructed to do so. Thus, if you decide the warnings are of no +consequence, you may choose to ignore them. Note that AnaGram will +resolve shift-reduce conflicts by choosing the shift action. It will +resolve reduce-reduce conflicts by choosing the rule that occurred +first in the grammar. + +If you have \index{Syntax error}\index{Errors}syntax errors, AnaGram +will synchronize the cursor in the syntax file window with the cursor +in the \agwindow{Warnings} window so that as you move through the +warning messages, the cursor in the syntax file window will move to +the point of the error. + +AnaGram's warning messages are listed, in alphabetical order, in +Appendix B, together with explanations of the messages and suggestions +as to what to do about them. + + +\section{The Conflicts Window} +\index{Conflicts}\index{Window} + +The \agwindow{Conflicts} window lists the conflicts which AnaGram +found in your grammar. The table identifies the parser states in +which it found conflicts. For each conflict state it identifies the +reducing token for which it had more than one option, and the marked +rules which characterize each such option. If AnaGram was able to +resolve a conflict by using \agparam{precedence} or \agparam{sticky} +declarations, the conflict will be listed in the \index{Resolved +Conflicts}\index{Window}\agwindow{Resolved Conflicts} window instead. + +The \agwindow{Conflicts} window and the \agwindow{Resolved Conflicts} +window each require several lines to display a conflict. For each +conflict, the first, or header, line identifies the conflict state and +the conflict token. The conflict token is a terminal token for which +there is more than one parser action consistent with the rules of the +grammar. If both shift and reduce actions are consistent with the +grammar, the conflict is a \agterm{shift-reduce} conflict. If more +than one reduce action is consistent with the grammar, the conflict is +said to be a \agterm{reduce-reduce} conflict. + +Following the header line, AnaGram lists the grammar rules which the +conflict token could satisfy. The rules which correspond to shift +actions are listed first, followed by those which correspond to reduce +actions. Rules which correspond to shift actions have not yet been +completed and have a marked token, shown in a distinctive type face, +which represents the next token expected. For such rules, the +conflict token is an instance of the marked token and can be shifted +into the input buffer. If the rule has been completely matched there +is no marked token and the conflict token can reduce the rule. In the +following discussion, such a rule is referred to as a \agterm{conflict +rule}. If there is a more than one conflict rule, the conflict is +called a \agterm{reduce-reduce} conflict; otherwise it is a +\agterm{shift-reduce} conflict. Although it is theoretically possible +for a conflict to be both a shift-reduce conflict and a reduce-reduce +conflict, it is not a common event. + +The \agmenu{Auxiliary Windows} menu for the \agwindow{Conflicts} window +provides eleven auxiliary windows to help you understand the +circumstances of a conflict. Five of these windows address the +conflict directly. These windows are the \agwindow{Conflict Trace}, the +\agwindow{Reduction Trace}, the \agwindow{Rule Derivation}, the +\agwindow{Token Derivation}, and the \agwindow{Problem States} +windows. These windows are keyed to the conflict state, the conflict +token, and the conflict rule. If the cursor bar is not highlighting a +conflict rule, AnaGram scans down the \agwindow{Conflicts} window for +the first conflict rule for the selected state and token. The use of +these windows in analyzing conflicts is described below. + +The other windows in the \agmenu{Auxiliary Windows} menu have been +described in Chapter 6. These are the \agwindow{Expansion Chain} +window, the \agwindow{Rule Context} window, the \agwindow{Reduction +States} window, the \agwindow{State Definition} window, the +\agwindow{State Expansion} window and the \agwindow{Token Usage} +window. The \agwindow{Rule Context} window is keyed to the particular +rule highlighted by the cursor. The +\index{Expansion Chain}\index{Window}\agwindow{Expansion Chain} window +is keyed to the highlighted rule. If the highlighted rule is a +characteristic rule no expansion chain is possible, so the option will +be greyed out. The \agwindow{Reduction States} window is keyed to the +highlighted rule if it is a conflict rule, otherwise to the first +conflict rule following the highlighted line. The \agwindow{State +Definition} and \agwindow{State Expansion} windows are keyed to the +conflict state. The \agwindow{Token Usage} window is keyed to the +conflict token. + + +\section{The Resolved Conflicts Window} +\index{Resolved Conflicts}\index{Window} + +AnaGram creates the \agwindow{Resolved Conflicts} window only when the +grammar it is analyzing has conflicts and when some of those conflicts +have been resolved by precedence rules (see Chapter 9) or by use of +the \agparam{sticky} directive. In this case, it will list, in the +\agwindow{Resolved Conflicts} window, the conflicts that were resolved +and the way in which they were resolved. The layout of the +\agwindow{Resolved Conflicts} window is identical to that of the +\agwindow{Conflicts} window with one exception. The rules which your +parser will observe, selected in accordance with your \agparam{sticky} +declarations or precedence rules, are marked with asterisks. Those +which your parser will ignore will not be so marked. The +\agwindow{Auxiliary Windows} available from the \agwindow{Resolved +Conflicts} window are the same as those available from the +\agwindow{Conflicts} window. + + +\section{Conflicts} +\index{Conflicts} + +Chapter 4 presented a brief introduction to the problem of conflicts +in a grammar. There, conflicts were simply classified as shift-reduce +conflicts or as reduce-reduce conflicts. This section examines +conflicts in somewhat greater detail. It looks at the causes of +conflicts, how to identify the cause of a conflict and how to deal +with it. + +Conflicts occur when, in a particular state, a given token, the +\index{Conflict token}\index{Token}\agterm{conflict token}, reduces +more than one rule, or reduces a rule when the token could also be +shifted into the input buffer in accordance with another rule. + +Note that there are no shift-shift conflicts, that is, a particular +state could have several +\index{Characteristic rules}\index{Rules}characteristic rules +accepting the same token as a shift token. A conflict is always +associated with reduction. This is because LALR parsers keep track of +multiple possibilities in the input stream as long as the input tokens +only cause shifts. At the point where a reduction occurs the parser +must be able to identify exactly one of the possibilities as +consistent with the input. + +\subsection{Shift-Reduce Conflicts} +\index{Shift-reduce conflict}\index{Conflicts} + +Unless a conflict is deliberate, the fact that the conflict token +reduces a particular rule is usually a surprise. Although some +conflicts arise because an LALR-1 parser generator cannot look far +enough ahead, most shift-reduce conflicts result from true +\index{Ambiguities}ambiguities in your grammar. For example, an +erroneous Pascal grammar contained the following productions, which +illustrate one of the more common classes of shift-reduce conflicts: + +\begin{indentingcode}{0.4in} +parameter list + -> expression + -> parameter list, expression +\end{indentingcode} + +Pascal actually requires that the expressions in a parameter list be +separated by commas. In other words, the second rule should read: + +\begin{indentingcode}{0.4in} + -> parameter list, comma, expression +\end{indentingcode} + +Lacking a requirement for a comma, the erroneous rules specify that a +parameter list is simply a sequence of expressions with nothing +separating them. For example, using this definition, + +\begin{indentingcode}{0.4in} +(a + b) - (c * d) +\end{indentingcode} + +can be interpreted either as one expression, if the minus sign is +interpreted as a subtraction operator, or as a sequence of two +expressions if it is interpreted as a unary minus sign. + +When AnaGram analyzed the grammar with the erroneous rule it found a +conflict in the state where the minus sign was expected. If the minus +sign is the subtraction operator, it should be shifted onto the input +stack. On the other hand, if the minus sign is a unary minus sign, +the sum \agcode{(a+b)} should be reduced to \agcode{expression}, and +then to \agcode{parameter list}. Thus, this ambiguity gives rise to a +shift-reduce conflict, where it is impossible to determine a unique +parser action. + +\subsection{Identifying Ambiguities} +\index{Ambiguities} + +Unfortunately, shift-reduce conflicts usually show up in parser states +which are far removed from the true source of the problem. When your +grammar has a shift-reduce conflict of the sort illustrated above, +there may be no readily apparent link to the erroneous +production. AnaGram provides several options to help you establish the +link, all available using the \agmenu{Auxiliary Windows} menu from the +\agwindow{Conflicts} window. These options comprise two pre-built +grammar traces, the \agwindow{Conflict Trace} and the +\agwindow{Reduction Trace}; two special windows called \agwindow{Rule +Derivation} and \agwindow{Token Derivation}; the \agwindow{Expansion +Chain} window and a special version of the \agwindow{Reduction States} +window called the \agwindow{Problem States} window. The four trace +and derivation windows are coordinated, so that AnaGram actually +computes a mutually consistent set of four whenever you ask for one of +them. + +The \agwindow{Conflict} and \agwindow{Reduction Traces} are simply +pre-initialized \agwindow{Grammar Traces} which you may use to become +more familiar with the circumstances of the conflict. Bear in mind, +however, that unless you have set the +\index{Traditional engine}\index{Configuration switches} +\agparam{traditional engine} configuration switch, +the traces will use compound actions, which in some cases will use the +default resolution of conflicts. Therefore, unless you set +\agparam{traditional engine} it is not always possible to work through +an alternate resolution of a conflict. + +The +\index{Conflict Trace}\index{Trace}\index{Window}\agwindow{Conflict Trace} +shows you one possible sequence of input which will lead to the +conflict state. Unless a conflict results from a typo or other +clerical error, it is usually a surprise and not something you +anticipated. The \agwindow{Conflict Trace} helps you to understand +the context of the problem. It is often helpful to look at the +\agwindow{Rule Stack} for the \agwindow{Conflict Trace} to see what +productions are involved in the conflict. You can inspect the +\agwindow{State Definition} window for the conflict state to see why +the conflict token can be shifted. + +\index{Reduction Trace}\index{Trace}\index{Window} +The \agwindow{Reduction Trace}, for a particular conflict rule, +carries forward a \agwindow{Conflict Trace} by selecting the reduce +action. In other words, for a parser in the conflict state, the +\agwindow{Reduction Trace} shows the result of using the conflict +token as a reducing token. The trace shows the result of taking all +possible reductions until a state is reached where the lookahead token +will shift into the input buffer. If you compare the +\agwindow{Reduction Trace} to the \agwindow{Conflict Trace} you will +notice that they start off with identical tokens. At some point they +diverge. All of the tokens in the \agwindow{Conflict Trace} following +this point have reduced to the token in the \agwindow{Reduction +Trace}. Usually there are no further tokens in the +\agwindow{Reduction Trace}, but if there are, you will notice that +they are all zero-length tokens. That is, they all have a null +production, so that they come for free, as it were. If you press +Enter, to display the \agwindow{Select Input} window for the +\agwindow{Reduction Trace}, you will see that the conflict token is +acceptable input. + +The \agwindow{Reduction Trace} illustrates the ambiguity in your +grammar that caused the conflict: One of the acceptable input tokens +in the final state of this trace cannot be distinguished from its +predecessor. + +The \index{Expansion Chain}\index{Window}\agwindow{Expansion Chain} +window is extremely useful for indicating why a particular grammar +rule is an expansion rule in a particular state. Sometimes the chain +of substitutions to produce a given rule is quite long, and changes to +a grammar can have unexpected effects. The \agwindow{Expansion Chain} +window enables you to see how a rule in a conflict state got to be +there. + +\index{Problem States}\index{Window}The \agwindow{Problem States} +window for a conflict rule is essentially a trimmed +\agwindow{Reduction States} window which shows only those reduction +states in which the conflict token is acceptable input. Often it is +sufficient to look at the \agwindow{Problem States} window to see the +source of the problem. The last state in the \agwindow{Reduction +Trace} window, will be, of course, one of the states in the +\agwindow{Problem States} window. + +\subsection{Rule and Token Derivation Windows} + +If after inspecting the \agwindow{Problem States}, \agwindow{Conflict +Trace} and the \agwindow{Reduction Trace}, the conflict is still a +mystery, you may pinpoint the problem by looking at the +\agwindow{Rule} and \agwindow{Token Derivation} windows for the +conflict. Both windows contain a sequence of rules, and both begin +with the same rule, the rule which is the root cause of the conflict. +This rule is one of the characteristic rules of the last state in the +\agwindow{Reduction Trace} and is the rule which produces the ambiguity. + +\index{Rule Derivation}\index{Window}In both windows, each rule, +except the last line of the \agwindow{Rule Derivation}, contains a +marked token in a distinctive typeface. The essence of the ambiguity +is that it is impossible to tell where in the input stream the marked +token in the top line of the \agwindow{Rule Derivation} ends and the +marked token in the top line of the \agwindow{Token Derivation} +begins. + +In both derivation windows, each line after the first consists of a +rule produced by the marked token on the previous line. In the +\agwindow{Rule Derivation}, except on the first line, the marked token +is either the last token in the rule, or all subsequent tokens have +null productions. The last rule in the derivation window is the +conflict rule you selected in the \agwindow{Conflicts} window. Thus +the \agwindow{Rule Derivation} window shows you how the rule involved +in the conflict derives from the root. + +\index{Window}\index{Token Derivation}In the \agwindow{Token +Derivation} window, except on the first line, the marked token is +either the first token in the rule, or all preceding tokens have null +productions. The marked token in the last line of the list is the +conflict token. + +The \agwindow{Rule Derivation} and \agwindow{Token Derivation} windows +each have five auxiliary windows. The \agwindow{Rule Context} window +is keyed simply to the highlighted rule. The other four windows, the +\agwindow{Expansion Rules} window, the \agwindow{Productions} window, +the \agwindow{Set Elements} window, and the \agwindow{Token Usage} +window are keyed to the marked token. Remember that there is no marked +token on the last line of the \agwindow{Rule Derivation} window. + +\subsection{Resolving Ambiguities} +\index{Ambiguities} + +The shift-reduce conflict discussed earlier stemmed from an oversight +in writing a grammar: the grammar specified a list of items, in this +case expressions, which could not be distinguished without a separator +of some sort. This conflict can be resolved simply by rewriting the +grammar to require a comma separating the expressions. + +A similar problem comes from specifying a list of lists without any +separator. Sometimes such a specification is made inadvertently. The +erroneous Pascal grammar cited above had such a conflict. Here are +the productions that caused trouble: + +\begin{indentingcode}{0.4in} +declaration part + -> declaration section + -> declaration part, declaration section + +declaration section + -> proc and func declaration part + +proc and func declaration part + -> proc or func declaration + -> proc and func declaration part, proc or func declaration + +proc or func declaration + -> proc declaration + -> func declaration +\end{indentingcode} + +Of course, there are other productions for \agcode{declaration +section} so that this grammar looks perfectly reasonable on the +surface. A \agcode{declaration part} is a sequence of +\agcode{declaration sections}. One type of \agcode{declaration +section}, the \agcode{proc and func declaration part}, is a sequence +of procedure or function definitions. Unfortunately, if you have two +procedure definitions in a row, it could be interpreted as one +\agcode{proc and func declaration part} consisting of two declarations +or two \agcode{proc and func declaration parts} each consisting of one +declaration. The problem here is that there is a little too much +recursion. You can fix this problem by replacing + +\begin{indentingcode}{0.4in} +declaration section + -> proc and func declaration part + +proc and func declaration part + -> proc or func declaration + -> proc and func declaration part, proc or func declaration +\end{indentingcode} + +with + +\begin{indentingcode}{0.4in} +declaration section + -> proc or func declaration +\end{indentingcode} + +After a little reflection you will see that this simplification, which +eliminates one source of recursion, entails no loss of generality. + +In the above examples, the ambiguities were the results of errors in +the specification of the grammar, and could be easily fixed by +correcting the grammar. Sometimes, however, ambiguities arise which +are not so easily corrected. Consider, for instance, a simple token +scanner: + +\begin{indentingcode}{0.4in} +tokens + -> [token, white space?...]... + +token + -> name + -> number + -> string + -> character constant + -> punctuation character +\end{indentingcode} + +where \agcode{name}, \agcode{number}, etc., are defined in the +conventional manner. + +Such token scanners are normally written using conventional +programming techniques, or using programs such as \agfile{lex} which +are based on the analysis of regular expressions. Such programs +simply work on a character by character basis and do not see that +``maryjane'' could be interpreted as one name or as two, or as even +more, since the separating white space is declared to be optional. +AnaGram, however, since it takes a broader view, sees that the +specifications are basically ambiguous and complains. Note, however, +that the default resolution of conflicts, in this case choosing the +shift action, will provide a parser that does precisely what you would +normally intend. Thus, the major concern is to make sure that the +conflicts caused by this situation do not mask other more serious +conditions. + +While it is possible to rewrite this grammar so that it is not +ambiguous, such a grammar would lack clarity and conciseness. AnaGram +provides two alternatives for dealing with this problem: the +\index{Declaration}\index{Subgrammar declaration}\agparam{subgrammar} +declaration and the +\index{Sticky declaration}\index{Declaration}\agparam{sticky} +declaration, which accomplish the desired end using very different +techniques. + +In the definition above, a grammar for token, taken all by itself, +would not normally show any ambiguities. We could write a +\index{Lexical scanner}lexical scanner that would identify a token, +pass it on to the parser, and then go on to the next. We would never +worry about any ambiguities. In this situation, it is clear that what +we mean by \agcode{token} is defined entirely by the grammar for +\agcode{token}, not by its usage within the rest of the grammar. When +we say that we have a sequence of tokens, \agcode{token...}, we +usually mean that, of course, two tokens, such as ``mary'' and +``jane'' have to be separated by punctuation or space to be +distinguished as separate tokens. The +\index{Declaration}\index{Subgrammar declaration}\agparam{subgrammar} +declaration allows you to specify that a particular \index{Nonterminal +token}\index{Token}nonterminal token in your grammar is to be treated +as though it were defined by its own complete grammar, without regard +to its usage in the grander scheme of things. In the example above, +add the following: + +\begin{indentingcode}{0.4in} +[ subgrammar \bra token \ket ] +\end{indentingcode} + +Now, when AnaGram conducts its search for reducing tokens, it will +limit its search to those it would find if your grammar consisted only +of the grammar rules necessary to define token. When it does this, it +never even finds the reducing tokens that appear to cause conflicts. +When you fix a conflict with the \agparam{subgrammar} declaration, the +conflict vanishes entirely. It is not recorded in the +\agwindow{Resolved Conflicts} window. For this reason, the +\agparam{subgrammar} declaration should be used only when you are +absolutely sure of the nature of the problem you are trying to solve. + +A somewhat more precise mechanism for dealing with the conflicts +described above is to declare name and number as \index{Sticky +declaration}\index{Declaration}\agparam{sticky}: + +\begin{indentingcode}{0.4in} +[ sticky \bra name, number \ket ] +\end{indentingcode} + +This means that whenever the last token in your parser's input buffer +is \agcode{name} or \agcode{number}, it will shift in any acceptable +input character, and will disregard conflicting reduce options. +AnaGram will continue to identify conflicts in your grammar due to +this cause, but will record them in the \agwindow{Resolved Conflicts} +window. Any other conflicts you may have will still show up in the +\agwindow{Conflicts} window. + +Sometimes it may be desirable to introduce ambiguities deliberately +into a grammar. For example, although it is easy enough to write +normal grammar rules for the conventional expression logic used in +programming languages, it is possible to produce a somewhat more +compact, faster running parser by writing an ambiguous grammar and +using AnaGram's precedence declarations to resolve the ambiguities. +An example of this technique is given in Chapter 9. + +\subsection{Null Productions} + +\index{Null productions}\index{Production}Null productions can be a +fertile source of conflicts. Fortunately, many of them can be easily +resolved by simply rewriting a few rules. Consider, for example, the +following: + +\begin{indentingcode}{0.4in} +name + -> capital letter, letter..., white space + +initial + -> capital letter, period, white space + +full name + -> name, initial?, name +\end{indentingcode} + +In this example, \agcode{initial?} is a virtual production which +AnaGram replaces with +% not ``with''... + +\begin{indentingcode}{0.4in} +initial? + -> + -> initial +\end{indentingcode} + +so \agcode{initial?} involves a null production. + +Recall that LALR-1 parsers cannot look ahead more than one character. +The problem with this grammar, then, is that given the input ``John +Q'', there is no way to tell whether the ``Q'' is an initial or the +first letter of the last name. Is this ``John Q. Public'' or ``John +Quincy''? Adding another production for ``full name'' so that +``initial'' need not be marked as optional allows the parser to defer +its decision: + +\begin{indentingcode}{0.4in} +full name + -> name, name + -> name, initial, name +\end{indentingcode} + +Notice that the technique here is to \emph{avoid a reduction} until +the parser has enough information to proceed. + +\subsection{Reduce-Reduce Conflicts} + +\index{Reduce-reduce conflicts}\index{Conflicts}Reduce-reduce +conflicts are usually caused by the limited lookahead of LR-1 parsing. +Very occasionally, you might find a grammar with a reduce-reduce +conflict that is caused by the LALR-1 approximation to LR-1 parsing +that AnaGram supports. Such conflicts are quite easy to resolve, +however, by almost trivial rewriting. + +Reduce-reduce conflicts commonly involve productions that are +intrinsically indistinguishable, for instance: + +\begin{indentingcode}{0.4in} +tweedledee + -> tiddly, wink + +tweedledum + -> tiddly, wink +\end{indentingcode} + +However, it may be possible to distinguish \agcode{tweedledee} from +\agcode{tweedledum} by context. Suppose + +\begin{indentingcode}{0.4in} +gibber + -> tweedledee, apples + -> tweedledum, oranges +\end{indentingcode} + +Now if you can distinguish \agcode{apples} from \agcode{oranges} on +the basis of the very first character, your parser will be able to +tell \agcode{tweedledee} from \agcode{tweedledum}. If, on the other +hand, it takes more than one input character to tell whether you have +\agcode{apples} or \agcode{oranges}, you have a reduce-reduce conflict. + +This example, of course, is rather far-fetched, since the potential +confusion between \agcode{tweedledee} and \agcode{tweedledum} is +obvious. More commonly, the definitions and usage of +\agcode{tweedledee} and \agcode{tweedledum} are physically far apart +in the grammar. At one place in a grammar: + +\begin{indentingcode}{0.4in} +balderdash + -> tweedledee, apples +\end{indentingcode} + +Elsewhere, seemingly unrelated: + +\begin{indentingcode}{0.4in} +buncombe + -> tweedledum, oranges +\end{indentingcode} + +In a dark corner: + +\begin{indentingcode}{0.4in} +gossip + -> buncombe, slander +\end{indentingcode} + +And in yet another dark corner: + +\begin{indentingcode}{0.4in} +small talk + -> gossip + -> rumors + -> balderdash +\end{indentingcode} + +Once again, if \agcode{apples} and \agcode{oranges} cannot be +distinguished by the very first character, you have a reduce-reduce +conflict that is somewhat harder to find than the one discussed above, +simply because the trail is somewhat more complicated to follow. + +\subsection{Identifying Reduce-Reduce Conflicts} +\index{Reduce-reduce conflicts}\index{Conflicts} + +Although the machinery described above for identifying shift-reduce +conflicts can be used to investigate reduce-reduce conflicts, many +yield to comparison of the \agwindow{Reduction States} windows for the +two rules. For the two rules to be distinguishable, no token which is +acceptable input for any of the reduction states belonging to one rule +can be acceptable input for any reduction state belonging to the +other. + +Therefore, when you have a reduce-reduce conflict, there is an overlap +among the tokens in the reduction states for one rule and the tokens +in the reduction states for the other rule. Often, however, there are +many reduction states which are irrelevant as far as a particular +conflict is concerned. So AnaGram provides a special window, the +\agwindow{Problem States} window, in the \agmenu{Auxiliary Windows} +menu for the \agwindow{Conflicts} and \agwindow{Resolved Conflicts} +windows. The \agwindow{Problem States} window for a particular rule +lists those, and only those, reduction states for the rule for which +the conflict token is acceptable input. This usually reduces the size +of the table and often makes the source of the problem immediately +obvious. + +\subsection{Resolving Reduce-Reduce Conflicts} + +If you build a parser for a grammar that includes a reduce-reduce +conflict, AnaGram will arbitrarily reduce the first rule it +encountered while reading your grammar, that is, the rule with the +smaller rule number. In some circumstances this default may be +acceptable. In other circumstances more may be required. + +Often, it is possible to get rid of reduce-reduce conflicts by +rewriting your grammar so that tokens that were once reducing tokens +for a rule become part of the rule itself. Thus, in the previous +example, you could write: + +\begin{indentingcode}{0.4in} +tweedledee + -> tiddly, wink, apples + +tweedledum + -> tiddly, wink, oranges +\end{indentingcode} + +In other words, if \agcode{apples} always follows \agcode{tweedledee}, +you can move \agcode{apples} ``down'' in the grammar by making it part +of \agcode{tweedledee}. Moving tokens ``down'' in this manner always +reduces the likelihood of conflicts. But what if \agcode{apples} +doesn't always follow \agcode{tweedledee}? Then it is sometimes +worthwhile to make two versions of \agcode{tweedledee}: + +\begin{indentingcode}{0.4in} +simple tweedledee + -> tiddly, wink + +complex tweedledee + -> tiddly, wink, apples +\end{indentingcode} + +and in each circumstance use whichever is appropriate. As with the +earlier shift-reduce example, the technique is to avoid any reduction +until the tokens can be distinguished. + +Sometimes, however, reduce-reduce conflicts reflect a language design +that really demands greater lookahead than can be accommodated with an +LR-1 parser. In these cases you have two options: you can consider a +multi-stage parser or you can implement an ad hoc lookahead procedure. +In the latter case, you can use semantically determined productions to +control your parser depending on the results of your lookahead +procedure. + +\subsection{Conflicts due to LALR-1 Simplifications} +\index{LALR simplifications}\index{Conflicts} + +LALR-1 \index{Parser generator}parser generators occasionally report a +reduce-reduce conflict where an LR-1 parser would successfully resolve +the conflict. How significant is this problem? An early prototype +version of AnaGram actually had full LR-1 capability. Unfortunately +LR-1 parsers are generally much bigger than LALR-1 parsers. Creating +them takes longer and more resources so that for given computer +resources an LALR-1 parser generator can handle larger, more complex +grammars than an LR-1 parser generator. To provide a compromise, the +prototype was modified to do an LALR-1 analysis first. Then, only if +there were reduce-reduce conflicts did it do an LR-1 analysis. Over a +substantial period of testing, the only reduce-reduce conflicts that +were resolved by the LR-1 analysis were those that were created +deliberately as tests. Eventually the LR-1 capabilities were removed +from AnaGram since they added complexity without concomitant benefits. +To understand why this is so, it is worthwhile to look at an example. +Let us recall the definitions of \agcode{tweedledee} and +\agcode{tweedledum} made above: + +\begin{indentingcode}{0.4in} +tweedledee + -> tiddly, wink +tweedledum + -> tiddly, wink +\end{indentingcode} + +Remember that we defined gibber thus: + +\begin{indentingcode}{0.4in} +gibber + -> tweedledee, apples + -> tweedledum, oranges +\end{indentingcode} + +Now suppose that we also define jabber in a contrary way: + +\begin{indentingcode}{0.4in} +jabber + -> tweedledee, oranges + -> tweedledum, apples +\end{indentingcode} + +Now suppose we use \agcode{gibber} in one part of our grammar and +\agcode{jabber} in another, completely unrelated, part. An LR-1 +parser generator would succeed in keeping \agcode{tweedledee} and +\agcode{tweedledum} straight in spite of our efforts to confuse it. +An LALR-1 parser generator, however, succumbs to our efforts and +reports a conflict. A trivial rewrite of the grammar, however, +removes the conflict: + +\begin{indentingcode}{0.4in} +gibber tweedledee + -> tiddly, wink +gibber tweedledum + -> tiddly, wink + +jabber tweedledee + -> tiddly, wink +jabber tweedledum + -> tiddly, wink + +gibber + -> gibber tweedledee, apples + -> gibber tweedledum, oranges + +jabber + -> jabber tweedledee, oranges + -> jabber tweedledum, apples +\end{indentingcode} + +The effect of this rewrite is to accomplish the same separation of +states that an LR-1 parser generator would make. + +\section{Keyword Anomalies} +\index{Keyword Anomalies}\index{Anomaly} + +In syntax directed parsing, it is assumed that input tokens can be +uniquely identified. In the case of keywords, however, there is the +possibility that the individual characters making up the keyword, as +well as the keyword taken as a whole, could constitute legitimate +input under some circumstances. Thus keywords, though a powerful and +useful tool, are not completely consistent with the assumptions that +underlie syntax directed parsing. This can occasionally give rise to a +type of conflict, diagnosed by AnaGram, called a \agterm{keyword +anomaly}. AnaGram is quite conservative in its diagnoses, so that many +keyword anomalies it reports are actually innocuous and can be safely +ignored. Nevertheless, anomalies should be investigated carefully. + +AnaGram only recognizes a keyword in those states where the keyword is +allowable input. If AnaGram recognizes a keyword, the keyword takes +precedence over the sequence of characters that constitute the +keyword. If there are several allowable keywords, the longest takes +precedence. + +Under certain circumstances it is possible for a grammar to have a +state where a particular keyword may or may not be allowable input +depending on how the parser gets to that state. That is, for some +sequences of tokens which lead to the state, the keyword is allowable, +in others it isn't. In such a state, the keyword must always cause a +reduction, since if the keyword were to shift, it would always be +allowable input. When AnaGram finds such a state in your parser, it +checks to see what the parser action would be if the parser ignored +the keyword and interpreted it in terms of its constituent +characters. If the parser action would be anything other than reducing +the same rule as the keyword or a syntax error, AnaGram will diagnose +a keyword anomaly in this state. + +To summarize, the nature of a keyword anomaly is this: It is possible +for your parser to get to the anomalous state by a route for which the +keyword is not allowable. In the anomalous state, at least the first +constituent character of the keyword is allowable in its own right and +causes a parser action different from that caused by the +keyword. Nonetheless, if the keyword should appear in the input, your +parser will recognize it, perform at least one reduction action, get +to a state where the keyword is not recognized, and finally, in the +wrong state, will try to interpret the keyword in terms of its +constituent characters. + +Now, what do you do about a keyword anomaly? If in the anomalous state +the characters which constitute the keyword would never be +syntactically admissible as individual characters, the keyword anomaly +is innocuous in that the only problem is that the recognition of the +syntax error is somewhat delayed. If however, the characters which +constitute the keyword are syntactically admissible, you have a +problem that needs to be fixed. + +In many programming languages, many keywords are treated as +\agterm{reserved words}, that is, as words that may not be otherwise +used in a conforming program. Version 1.5 of AnaGram has added a +\index{Reserve keywords}\agparam{reserve keywords} statement that you +can use to specify that the text of a keyword is \emph{never} +admissible as individual characters. Thus you will never get a +keyword anomaly diagnostic for a keyword that has been listed in a +\index{Reserve keywords}\agparam{reserve keywords} statement. +The \agparam{reserve keywords} statement is described in greater +detail in \S 8.2.28. +% XXX this is the only place in the manual (so far?) that uses the +% \S section-symbol. (Also, this should be a tex crossreference.) + +\subsection{The Keyword Anomalies Window} +\index{Keyword Anomalies}\index{Window} + +If your grammar has a keyword anomaly, AnaGram will provide a warning +message and will create a \agwindow{Keyword Anomalies} window to +provide details about the anomalies in your grammar. + +Each entry in the \agwindow{Keyword Anomalies} window consists of two +lines. The first line identifies the state at which the anomaly +occurs and the offending keyword. The second line identifies the rule +which the keyword may erroneously reduce. The \agmenu{Auxiliary +Windows} menu provides three auxiliary windows keyed directly to the +anomaly to help you determine the nature of the problem: the +\agwindow{Keyword Anomaly Trace} window, the \agwindow{Reduction +Trace} window, and the \agwindow{Rule Derivation} window. In +addition, three other windows provide supporting information: the +\agwindow{Reduction States} window, the \agwindow{Rule Context} window +and the \agwindow{State Definition} window. These windows are +discussed in Chapter 6. + +The \agwindow{Keyword Anomaly Trace} window, a pre-built +\agwindow{Grammar Trace}, shows a sequence of tokens that illustrates +one way to encounter the anomaly. The \agwindow{Reduction Trace} +window shows the result of performing the reduction action. Just as +for conflicts, the true source of the problem is often at some remove +from the actual state in which the problem manifests itself. One of +the characteristic rules for the last state in the \agwindow{Reduction +Trace} window is, in fact, the true source of the problem. The +\agwindow{Rule Derivation} window can show you which of the +characteristic rules is the offending party and how the rule in the +anomalous state derives from it. + +\subsection{Dealing with Keyword Anomalies} + +If your grammar has a keyword anomaly, you should study it carefully. +It may be that the anomaly is completely innocuous, that is, the +characters which constitute the keyword are not otherwise legitimate +input. In this case you can simply ignore the message. + +The basic approach to dealing with keyword anomalies is to revise your +grammar to make a better distinction between the contexts in which the +keyword is acceptable and those in which it may be confused with +otherwise valid input. Several other techniques may be useful: + +For an anomaly to occur, there must be, \emph{in the anomaly state}, +an alternative interpretation of some of the initial characters of the +keyword. If the anomaly occurs because you have several keywords that +begin in a similar manner, you may use the +\index{Distinguish keywords}\index{Keywords}\agparam{distinguish keywords} +statement (see Chapter 8) to keep them properly separated. Note that +having several keywords that begin with the same characters does +\emph{not} necessarily cause an anomaly. + +If the keyword is an alphabetic keyword that can follow conventional +variable name tokens, you may wish to use a \agparam{sticky} +declaration or the +\index{Distinguish lexemes}\index{Configuration switch} +\agparam{distinguish lexemes} switch +to inhibit recognition of the keyword when it is preceded immediately +by a name. + +If the keyword can be treated as a reserved word, the \index{Reserve +keywords}\agparam{reserve keywords} statement will eliminate the +anomaly.