Mercurial > ~dholland > hg > ag > index.cgi
view doc/manual/xg-ii.tex @ 0:13d2b8934445
Import AnaGram (near-)release tree into Mercurial.
author | David A. Holland |
---|---|
date | Sat, 22 Dec 2007 17:52:45 -0500 |
parents | |
children |
line wrap: on
line source
\chapter{Exploring Your Grammar II: Grammar Tables} \section{Purpose of This Chapter} AnaGram creates a number of tables which are useful for understanding your grammar and verifying its operation. This chapter discusses these tables and what you can learn from them. The discussions are organized around related groups of tables which deal with particular aspects of your grammar. Generally speaking, when you make a new grammar or make extensive revisions of an old one, you should look at some of these tables just to verify that they make sense. Many of the tables AnaGram creates are listed in the \index{Browse menu}\index{Menu}\agmenu{Browse} menu of the \agwindow{Control Panel} after you have analyzed your grammar. Other tables, which expand upon the data in a particular window, are available by clicking the right mouse button to pop up the \agmenu{Auxiliary Windows} menu. The tables in this menu expand upon the data under the cursor bar in the window you are examining. If a particular menu item is not available for the selected data, it will be greyed out. Most of AnaGram's windows simply format data which summarize your grammar. The Trace windows, however, are interactive and allow you to explore your grammar dynamically. Note that certain windows which show grammar rules or reduction procedures are synched with your syntax file window for your convenience. AnaGram's windows are summarized in Appendix C. \section{Formats and Display Conventions} \index{Display conventions} AnaGram's data tables display relationships between character sets, partition sets, tokens, grammar rules, and parser states. Generally, each line of a table displays one such relationship. \index{C000}\index{S000}\index{R000}\index{P000}\index{T000} Each entity is uniquely identified by an appropriate number, with the initial letter specifying what kind of entity is meant: % ``T'' for token, ``S'' for state, ``R'' for rule, ``C'' for % character set, and ``P'' for partition set. Thus C013 is character % set 13, T049 is token 49, and so on. \textit{T} for token, \textit{S} for state, \textit{R} for rule, \textit{C} for character set, and \textit{P} for partition set. Thus \textit{C013} is character set 13, \textit{T049} is token 49, and so on. Generally when \index{Token}\index{Token number} \index{Number}token numbers are displayed, the token name is also given. When grammar rules are displayed, the rule itself is also displayed. Furthermore, when rules are displayed, the table display is synched with the display of the syntax file, so that you can see the rule in context. Rules often are displayed with a ``marked token'' in a distinctive font (which you may select) to indicate progress in matching the rule. This signifies that the rule has been matched up to the point just before the marked token. To continue matching the rule, the next following input must either be the marked token, if it is a terminal token, or must eventually reduce to it, if it is nonterminal. % Note: The % won't set in italic, so can't use \textit{} with it If a token name is followed by a ``\%'' character it means that AnaGram created a \index{Shell production}\index{Production}\agterm{shell production} for this token. The token with the ``\%'' is the basic input token and the token without the ``\%'' is the shell production. AnaGram creates shell productions when you use the \index{Disregard statement}\index{Statement}\index{\_prc}\agparam{disregard} statement to pass over certain characters or constructs in the input. The \agparam{disregard} statement is discussed in Chapter 8. \section{Character Sets} \index{Character Sets} AnaGram does an extensive analysis of the character sets you use in your grammar. In particular, it checks to see if there are any overlaps among your character sets. If so, it creates a \index{Character universe}\index{Universe}\index{Partition} \agterm{partition} of the character universe. The \index{Character universe}\index{Universe}character universe consists of the set of eight bit unsigned characters unless you have defined characters outside this range. In such an event the character universe will be extended down to negative values and above 255 only so far as is necessary to include all the characters you have defined in your grammar. The \index{Partition}partition consists of a collection of mutually disjoint sets, called \agterm{partition sets}, such that every character in the \index{Character universe}\index{Universe}character universe belongs to exactly one partition set and any one of your character sets can be written uniquely as a union of partition sets. AnaGram then adds a number of productions to your grammar which describe your character sets in terms of the partition sets. There are three primary tables you may use to see how the character sets you have used in your grammar are analyzed by AnaGram. These are the \index{Window}\agwindow{Character Sets} table, the \index{Partition Sets}\index{Window}\agwindow{Partition Sets} table, and the \index{Character Map}\index{Window}\agwindow{Character Map} table. As described below, each of these tables provides access to additional tables using the \agmenu{Auxiliary Windows} popup menu. One auxiliary window, \agwindow{Set Elements}, can be used in any window that identifies a character set, a partition set, or a terminal token to see the characters that comprise the set, or, in the case of a terminal token, that comprise the set of characters that corresponds to the terminal token. When you inspect these tables, you should verify that they correspond to your understanding of your grammar. You should particularly check to make sure that characters that show up as unused are really supposed to be unused. On the other hand, do all the characters that are shown to be used make sense? \paragraph{Character Sets.}\index{Character Sets}\index{Window} The \agwindow{Character Sets} window lists all of the distinct character sets which you have defined, implicitly or explicitly, in your grammar. Each line in the table describes one such set. The description has the following fields: \begin{itemize} \item internal set number \item token number if any \item name, if any, followed by ``='' \item the expression defining the set \end{itemize} The \agmenu{Auxiliary Windows} menu for the \agwindow{Character Sets} window provides three options. The \agmenu{Partition Sets} option displays the partition sets that cover the character set you have selected. The \agmenu{Set Elements} window shows the composition of the selected character set. If the character set corresponds to a token in your grammar, the \agmenu{Token Usage} window will show all rules in your grammar where the token is used. % XXX provides -> generates? \paragraph{Partition Sets.}\index{Partition Sets}\index{Window} There are two Partition Sets windows available. From the \agmenu{Window} menu, the \agmenu{Partition Sets} option provides a list of all the sets that cover the character universe. From the \agmenu{Auxiliary Windows} menu for the \agwindow{Character Sets} table, the \agmenu{Partition Sets} option provides a list of the sets that cover the selected character set. In this case, the character set number appears on the title bar of the \agwindow{Partition Sets} window. Each line of a \agwindow{Partition Sets} window describes a particular set in the covering. The description has the following fields: \begin{itemize} \item the partition set number \item the token number assigned to this set \item the token name, if any, that corresponds to this set \end{itemize} Partition set zero is the set of all characters in the character universe that your parser does not accept. If one of the characters in this set appears in the input to your parser, your parser will signal a syntax error. You should check this set to make sure it conforms to your expectations. The \agmenu{Auxiliary Windows} menu for the \agwindow{Partition Sets} window provides two options: \agmenu{Set Elements} and \agmenu{Token Usage}. \agmenu{Set Elements} will display the characters which comprise the partition set. \agmenu{Token Usage} will display all the rules in your grammar that use the token assigned to this set. If this particular partition set was developed by AnaGram because of an overlap, it may not correspond precisely to any token in your grammar. Under these circumstances there will be no explicit usage in your grammar and \agmenu{Token Usage} will be greyed out in the \agmenu{Auxiliary Windows} menu. % ...but in that case shouldn't it show which *used* character sets it % appears in? % also, XXX: s/developed/generated/ \paragraph{Character Map.}\index{Character Map}\index{Window} The \agwindow{Character Map} table shows you the mapping of input characters to token numbers. The \agcode{ag{\us}tcv} table in your parser is based on the information in this table. The fields in this table are, in order: \begin{itemize} \item \index{Character codes}character code \item display character (if any) \item partition set number \item token number \item token representation \end{itemize} The display character will be whatever Windows displays for this code in the \agoption{Data Tables} font you have chosen. If a character is not used in your grammar the token number and token representation are both \index{T000}\textit{T000}. The \agmenu{Auxiliary Windows} popup menu provides two options: \agmenu{Set Elements} and \agmenu{Token Usage}. \agmenu{Set Elements} shows the elements of the partition set to which the selected character belongs. The \agmenu{Token Usage} window shows the rules in your grammar in which the token corresponding to this partition set number is used. \paragraph{Set Elements.}\index{Set Elements}\index{Window} The Set Elements window can be accessed only through the \agmenu{Auxiliary Windows} menu in a window that identifies a character set, a partition set, or a terminal token. The \agwindow{Set Elements} window shows the numeric code and screen representation for each element of the set. The character set or partition set number is displayed in the title bar of the window. In the case of a terminal token, the character set displayed is the character set corresponding to the terminal token. There is no \agmenu{Auxiliary Windows} menu defined for the \agwindow{Set Elements} window. \section{The Elements of Your Grammar} In analyzing your syntax, AnaGram takes it completely apart and creates an internal representation of it. A number of the internal tables are available for your inspection. Two tables, the \index{Symbol Table}\index{Window}\index{Table}\agwindow{Symbol Table} and the \index{Token Table}\index{Window}\index{Table}\agwindow{Token Table}, identify the elementary constituents of your grammar. A third, the \index{Rule Table}\index{Window}\index{Table}\agwindow{Rule Table}, summarizes the grammar AnaGram has abstracted from your syntax file. The \agwindow{Symbol Table} and \agwindow{Token Table} are not equivalent since you may have named character sets which are not tokens. You may also have tokens which you have defined directly as character sets or character ranges and therefore have no names. A number of \agwindow{Auxiliary Windows} also provide useful information about the tokens in your grammar. These are the \agwindow{Expansion Chain}, \agwindow{Expansion Rules}, \agwindow{Productions}, \agwindow{Rule Context}, \agwindow{Set Elements}, and \agwindow{Token Usage} windows. \paragraph{Symbol Table.}\index{Symbol Table}\index{Window}\index{Table} The \agwindow{Symbol Table} lists all the symbols you used in your grammar. Symbols may be used, of course, to identify tokens, definitions, or virtual productions or to provide alternative names for tokens. Each line in this table identifies a single symbol. The first field is the token number, if any. This is followed by the name. If the name was defined by a definition statement, it is followed by an equal sign and the right side of the definition. The \agmenu{Auxiliary Windows} menu for the \agwindow{Symbol Table} has four options: \agmenu{Expansion Rules}, \agmenu{Productions}, \agmenu{Set Elements}, and \agmenu{Token Usage}. The \agmenu{Expansion Rules} and \agmenu{Productions} windows exist only for symbols which name nonterminal tokens. The \agmenu{Set Elements} window exists only for symbols which name character sets or terminal tokens. The \agmenu{Token Usage} table exists for any symbol which names a token. \paragraph{Token Table.}\index{Token Table}\index{Window}\index{Table} The \agwindow{Token Table} lists all the tokens of your grammar. The first field is the token number. It is followed by a flag field which is \textit{zl} if the token is a nonterminal token and is \index{Zero length token}\index{Token}zero length. If the token is nonterminal and not zero length, the flag field contains \textit{nt}. If the token is a terminal token, the field is blank. The next field is blank unless the token has been declared \index{Sticky declaration}\agparam{sticky} or has had a precedence level assigned. If the token is sticky, this field will contain \textit{s}. If a precedence level has been assigned, this field will contain the letter \textit{l}, \textit{r}, or \textit{n} to indicate associativity followed by the precedence level. Finally there is the \index{Data type}\index{Token}data type of the semantic value of this token and the token representation. The \agmenu{Auxiliary Windows} menu for the \agwindow{Token Table} has four options: \agmenu{Expansion Rules}, \agmenu{Productions}, \agmenu{Set Elements} and \agmenu{Token Usage}. The \agmenu{Expansion Rules} and \agmenu{Productions} windows exist only for nonterminal tokens. \agmenu{Set Elements} exists only for terminal tokens. If you have used \index{Disregard statement}\index{Statement}\agparam{disregard} statements to cause white space or other uninteresting text to be skipped in the input to your parser, many of your tokens will appear in the \agwindow{Token Table} twice: once in the normal form and once with the ``\%'' character appended. For instance, if you have specified that \agcode{space} be disregarded after \agcode{name}, there will be entries for both \agcode{name} and \agcode{name\%}\index{ \_prc}. In this case, \agcode{name\%} represents the simple token and \agcode{name} represents \agcode{name\%} followed by \agcode{space?...} It is a good idea to check the \agwindow{Token} and \agwindow{Symbol} tables from time to time to make sure that all the names are the ones you intended and not the result of typographical errors. \paragraph{Rule Table.}\index{Rule Table}\index{Window}\index{Table} The \agwindow{Rule Table} lists, in numerical order, all the grammar rules defined in your grammar. Each rule is preceded by the nonterminal tokens which produce it. If you are not using semantically determined productions, then there will be precisely one token line per rule. The \agwindow{Rule Table} is synched to your syntax file to show the rule in context. The \agmenu{Auxiliary Windows} popup menu for the \agwindow{Rule Table} has four options: \agmenu{Expansion Rules}, \agmenu{Productions}, \agmenu{Rule Context} and \agmenu{Token Usage}. The \agmenu{Expansion Rules}, \agmenu{Productions} and \agmenu{Token Usage} windows are keyed to the lines in the \agwindow{Rule Table} which identify tokens. AnaGram will beep if you select one of these options while a rule is highlighted. The \agmenu{Rule Context} window is keyed to the highlighted rule, or, if a token is highlighted, the next following rule. \paragraph{Expansion Rules.}\index{Expansion Rules}\index{Window} The \agwindow{Expansion Rules} window is available in the \agmenu{Auxiliary Windows} menu from any window that identifies tokens. It displays a complete left expansion of the selected token if the token is nonterminal. That is, it is a list of rules that begins with all the rules produced by the token, plus all the rules produced by the first token of any rule in the list. The token number and the name or other representation of the token being expanded is displayed on the title bar of the window. The \agwindow{Expansion Rules} window is synched with the syntax file window so you can see each rule in context. The \agwindow{Auxiliary Windows} available from the \agwindow{Expansion Rules} window are the \agwindow{Expansion Chain}, \agwindow{Expansion Rules}, \agwindow{Productions}, \agwindow{Rule Context}, \agwindow{Set Elements}, and \agwindow{Token Usage} windows, all keyed to the marked token in the highlighted rule. \paragraph{Expansion Chain.}\index{Expansion Chain}\index{Window} The \agwindow{Expansion Chain} window is available in the \agmenu{Auxiliary Windows} menu from any window that contains expansion rules, in particular, from the \agwindow{Expansion Rules} window, from a \agwindow{Conflicts} or \agwindow{Rule Stack} window (see Chapter 7), from a \agwindow{State Expansion} window (see below), or even from an \agwindow{Expansion Chain} window. % XXX that last ``an'' should be ``another'' The purpose of an \agwindow{Expansion Chain} window is to show how a particular expansion rule in a particular state derives from a characteristic rule for that state. To see a chain of productions that produces a desired expansion rule, select the expansion rule with the cursor bar, click the right mouse button for the \agwindow{Auxiliary Windows} menu, and select \agmenu{Expansion Chain}. The \agwindow{Expansion Chain} window will then present a sequence of expansion rules, using the same format as the \agwindow{Expansion Rules} window, but subject to the constraint that each rule is produced by the marked token in the previous line. The first rule in the window is a characteristic rule for the given state. The last rule in the window is the rule selected by the cursor bar in the window from which you chose the \agwindow{Expansion Chain}. It should be noted that this expansion is not unique. There may be other derivations. \paragraph{Productions.}\index{Window}\index{Productions} The \agwindow{Productions} window is available in the \agmenu{Auxiliary Windows} popup menu from any window which identifies tokens. If the token selected by the cursor bar is a terminal token, the \agmenu{Productions} option will be greyed out. Otherwise, it will show all the rules the given token produces. The \agwindow{Productions} window is synched with the syntax file window so you can see each rule in context. The \agwindow{Productions} window does not have an \agmenu{Auxiliary Windows} menu. \paragraph{Reduction Procedures.} The \agwindow{Reduction Procedures} window lists the C function prototypes for the reduction procedures in your grammar. When this window is active, the syntax file window, if visible, is synchronized with it so you can see the body of the reduction procedure as well as its usage. \paragraph{Rule Context.}\index{Rule Context}\index{Window} The \agwindow{Rule Context} window is available in the \agmenu{Auxiliary Windows} popup menu from any window which identifies rules. When you select the \agwindow{Rule Context} window, AnaGram finds all the tokens which produce the selected rule and then finds all rules in your grammar which use any of these rules. The \agmenu{Auxiliary Windows} menu for the \agwindow{Rule Context} window offers five options: \agmenu{Expansion Rules}, \agmenu{Productions}, \agmenu{Rule Context}, \agmenu{Set Elements} and \agmenu{Token Usage}. The \agmenu{Rule Context} option is keyed to the highlighted rule in the original \agwindow{Rule Context} window. The remaining windows are keyed to the marked token in the highlighted rule in the original \agwindow{Rule Context} window. \paragraph{Token Usage.}\index{Token Usage}\index{Window} The \agwindow{Token Usage} window is available in the \agmenu{Auxiliary Windows} popup menu from any window which identifies tokens. It displays all rules in your grammar which use the specified token. The rules are displayed with a marked token, and \agwindow{Auxiliary Windows} accessed from the \agwindow{Token Usage} window will be keyed to the marked token which is the one following the specified token. % XXX the above sentence needs to be shot The \agmenu{Auxiliary Windows} menu for the \agwindow{Token Usage} window offers five options: \agmenu{Expansion Rules}, \agmenu{Productions}, \agmenu{Rule Context}, \agmenu{Set Elements} and \agmenu{Token Usage}. The \agmenu{Rule Context} option is keyed to the highlighted rule in the \agwindow{Token Usage} window. The remaining windows are keyed to the marked token in the highlighted rule in the original \agwindow{Token Usage} window. \section{State Tables} \index{State} When AnaGram analyzes your grammar, the principal result is the definition of parser states. AnaGram provides one table, the \agwindow{State Definition} table, listed in the \agmenu{Window} menu, which describes all the states in the parser. It also provides a number of \agwindow{Auxiliary Windows} that show the relationships among the states and the elements of your grammar. \paragraph{State Definition Table.} \index{State Definition Table}\index{Window}\index{Table}\index{Table} The \agwindow{State Definition Table} lists the rules which define the states of your parser. Each line contains the state number, which is blank if it is the same as the state number of the previous line, the rule number and finally the rule itself. The cursor in the syntax file window is synched with the cursor bar to show the grammar rule in context. Each state is defined by one or more rules, displayed with a marked token\index{Rule}\index{Token}\index{Marked rule} in a distinctive font. The meaning of the marked token is this: If your parser is in this state, it has accumulated, in the input buffer, all of the tokens in the rule that are to the left of the marked token. Further input must be consistent with the marked token in one or the other of the defining rules. If there is no marked token, the rule is a completed rule, and an appropriate lookahead token will cause the rule to be reduced. The marked rules that define a particular state of a parser are sometimes called the \index{Characteristic rules}\index{Rules}\agterm{characteristic rules} of the state. The \agmenu{Auxiliary Windows} menu for the \agwindow{State Definition Table} offers ten choices. Four are keyed to the marked token in the highlighted rule: \agwindow{Expansion Rules}, \agwindow{Productions}, \agwindow{Set Elements}, and \agwindow{Token Usage}. One, \agwindow{Rule Context}, is keyed simply to the highlighted rule. Four are keyed only to the highlighted state: \agwindow{Auxiliary Trace}, \agwindow{Keywords}, \agwindow{Previous States}, and \agwindow{State Expansion}. One, \agwindow{Reduction States}, is keyed to the combination of rule number and state number. It is available only for completed rules. \paragraph{State Definition.} \index{State Definition}\index{Window} For some windows which identify a state but do not show its definition, the Auxiliary Windows menu contains an entry to display the \index{Characteristic rules}\index{Rules}characteristic rules which identify the state. The state number is displayed on the title bar of the window. The \agwindow{Auxiliary Windows} choices for a \agwindow{State Definition} window are the same as for the general \agwindow{State Definition Table}. \paragraph{Auxiliary Trace.} \index{Auxiliary Trace}\index{Window}\index{Trace} \agwindow{Auxiliary Trace} windows may be accessed through the \agmenu{Auxiliary Windows} menu. The \agwindow{Auxiliary Trace} is a prebuilt \agwindow{Grammar Trace} showing one of perhaps many ways to get to the state identified by the cursor bar in the parent window. See Chapter 5 for a discussion of the \agwindow{Grammar Trace}. \paragraph{Keywords.}\index{Keywords}\index{Window} When you select \agmenu{Keywords} in the \agmenu{Window} menu, AnaGram displays a list of all the keywords defined in your grammar together with the token numbers assigned to them. When you select \agmenu{Keywords} in an \agmenu{Auxiliary Windows} menu, AnaGram displays a list of keywords which your parser will identify in the state determined by the cursor in the parent window. It displays all the keywords the parser will recognize in that state, regardless of whether they are used as shift or as reducing tokens. The state number is displayed on the title bar of the window. The \agmenu{Auxiliary Windows} menu for a \agwindow{Keywords Window} has only one option: \agmenu{Token Usage}, so you can see all uses of a given keyword in your grammar. \paragraph{Previous States.}\index{Previous States}\index{Window} A \agwindow{Previousf States} window can be accessed via the \agmenu{Auxiliary Windows} menu from any window which identifies parser states. It shows the defining rules for all the states which jump to the specified state. The \agmenu{Auxiliary Windows} options for a \agwindow{Previous States} window are the same as for the \agwindow{State Definition Table}. \paragraph{Reduction States.}\index{Window}\index{Reduction States} A \agwindow{Reduction States} window can be accessed via the \agmenu{Auxiliary Windows} menu from most windows which display marked rules with a specified state. In this case, if the highlighted rule is complete, that is, there is no marked token, the \agmenu{Reduction States} option will show you all the possible states the parser could go to on reducing the rule. The actual state your parser will go to depends on the actual sequence of tokens which brought your parser to the state you are investigating. The \agwindow{Reduction States} window is very useful in understanding conflicts and keyword anomalies. A special version of this window, called \agwindow{Problem States}, is available from the \agmenu{Auxiliary Windows} menu of the \agwindow{Conflicts} window. The \agmenu{Auxiliary Windows} options for a \agwindow{Reduction States} window are the same as for the \agwindow{State Definition Table}. \paragraph{State Expansion.}\index{State Expansion}\index{Window} The \agwindow{State Expansion} window may be accessed using the \agmenu{Auxiliary Windows} menu from any window that identifies a state. It shows the complete set of expansion rules for the state, consisting of the union of the set of characteristic rules and the sets of expansion rules for the token to the right of the mark in each characteristic rule. The \agwindow{State Expansion} window shows all possible legal input to your parser in the given state. The state itself is identified on the title bar of the window. The \agmenu{Auxiliary Windows} menu for the \agwindow{State Expansion} window has two options keyed to the state number: \agmenu{Auxiliary Trace} and \agmenu{Previous States}; two options keyed to the state number and rule number: \agmenu{Expansion Chain} and \agmenu{Reduction States}; four options keyed to the marked token in the highlighted rule: \agmenu{Expansion Rules}, \agmenu{Productions}, \agmenu{Set Elements}, and \agmenu{Token Usage}; and one option keyed simply to the highlighted rule: \agmenu{Rule Context}. \section{Coverage Analysis Tables} AnaGram provides two tables on the \agmenu{Browse} menu, \agwindow{Rule Coverage} and \agwindow{Trace Coverage}, to let you see which grammar rules have been identified by your parser in the course of your testing. \agwindow{Rule Coverage} is described in Chapter 9, Programming With AnaGram. \agwindow{Trace Coverage} is described in Chapter 5, Exploring Your Grammar I: Traces.