view doc/manual/xg-ii.tex @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.
author David A. Holland
date Sat, 22 Dec 2007 17:52:45 -0500
parents
children
line wrap: on
line source

\chapter{Exploring Your Grammar II: Grammar Tables}

\section{Purpose of This Chapter}

AnaGram creates a number of tables which are useful for understanding
your grammar and verifying its operation.  This chapter discusses
these tables and what you can learn from them.  The discussions are
organized around related groups of tables which deal with particular
aspects of your grammar.  Generally speaking, when you make a new
grammar or make extensive revisions of an old one, you should look at
some of these tables just to verify that they make sense.

Many of the tables AnaGram creates are listed in the
\index{Browse menu}\index{Menu}\agmenu{Browse} menu of the
\agwindow{Control Panel} after you have analyzed your grammar.  Other
tables, which expand upon the data in a particular window, are
available by clicking the right mouse button to pop up the
\agmenu{Auxiliary Windows} menu.  The tables in this menu expand upon
the data under the cursor bar in the window you are examining.  If a
particular menu item is not available for the selected data, it will
be greyed out.

Most of AnaGram's windows simply format data which summarize your grammar.
The Trace windows, however, are interactive and allow you to explore your
grammar dynamically.  Note that certain windows which show grammar rules or
reduction procedures are synched with your syntax file window for your
convenience.  AnaGram's windows are summarized in Appendix C.


\section{Formats and Display Conventions}
\index{Display conventions}

AnaGram's data tables display relationships between character sets,
partition sets, tokens, grammar rules, and parser states.  Generally,
each line of a table displays one such relationship.

\index{C000}\index{S000}\index{R000}\index{P000}\index{T000}
Each entity is uniquely identified by an appropriate number, with the
initial letter specifying what kind of entity is meant: 
% ``T'' for token, ``S'' for state, ``R'' for rule, ``C'' for
% character set, and ``P'' for partition set.  Thus C013 is character
% set 13, T049 is token 49, and so on.  
\textit{T} for token, \textit{S} for state, \textit{R} for rule,
\textit{C} for character set, and \textit{P} for partition set.  Thus
\textit{C013} is character set 13, \textit{T049} is token 49, and so on.  
Generally when \index{Token}\index{Token number}
\index{Number}token numbers are displayed, the token name is also
given.  When grammar rules are displayed, the rule itself is also
displayed.  Furthermore, when rules are displayed, the table display
is synched with the display of the syntax file, so that you can see
the rule in context.  Rules often are displayed with a ``marked
token'' in a distinctive font (which you may select) to indicate
progress in matching the rule.  This signifies that the rule has been
matched up to the point just before the marked token.  To continue
matching the rule, the next following input must either be the marked
token, if it is a terminal token, or must eventually reduce to it, if
it is nonterminal.

% Note: The % won't set in italic, so can't use \textit{} with it
If a token name is followed by a ``\%'' character it means that
AnaGram created a
\index{Shell production}\index{Production}\agterm{shell production}
for this token. The token with the ``\%'' is the basic input token and
the token without the ``\%'' is the shell production.  AnaGram creates
shell productions when you use the
\index{Disregard statement}\index{Statement}\index{\_prc}\agparam{disregard}
statement to pass over certain characters or constructs in the input.
The \agparam{disregard} statement is discussed in Chapter 8.


\section{Character Sets}
\index{Character Sets}

AnaGram does an extensive analysis of the character sets you use in
your grammar.  In particular, it checks to see if there are any
overlaps among your character sets.  If so, it creates a
\index{Character universe}\index{Universe}\index{Partition}
\agterm{partition} of the character universe.

The \index{Character universe}\index{Universe}character universe
consists of the set of eight bit unsigned characters unless you have
defined characters outside this range.  In such an event the character
universe will be extended down to negative values and above 255 only
so far as is necessary to include all the characters you have defined
in your grammar.

The \index{Partition}partition consists of a collection of mutually
disjoint sets, called \agterm{partition sets}, such that every
character in the \index{Character universe}\index{Universe}character
universe belongs to exactly one partition set and any one of your
character sets can be written uniquely as a union of partition sets.
AnaGram then adds a number of productions to your grammar which
describe your character sets in terms of the partition sets.

There are three primary tables you may use to see how the character
sets you have used in your grammar are analyzed by AnaGram.  These are
the \index{Window}\agwindow{Character Sets} table, the
\index{Partition Sets}\index{Window}\agwindow{Partition Sets} table,
and the \index{Character Map}\index{Window}\agwindow{Character Map}
table.  As described below, each of these tables provides access to
additional tables using the \agmenu{Auxiliary Windows} popup menu.
One auxiliary window, \agwindow{Set Elements}, can be used in any
window that identifies a character set, a partition set, or a terminal
token to see the characters that comprise the set, or, in the case of
a terminal token, that comprise the set of characters that corresponds
to the terminal token.

When you inspect these tables, you should verify that they correspond
to your understanding of your grammar.  You should particularly check
to make sure that characters that show up as unused are really
supposed to be unused.  On the other hand, do all the characters that
are shown to be used make sense?

\paragraph{Character Sets.}\index{Character Sets}\index{Window}
The \agwindow{Character Sets} window lists all of the distinct
character sets which you have defined, implicitly or explicitly, in
your grammar.  Each line in the table describes one such set.  The
description has the following fields:

\begin{itemize}
\item internal set number
\item token number if any
\item name, if any, followed by ``=''
\item the expression defining the set
\end{itemize}

The \agmenu{Auxiliary Windows} menu for the \agwindow{Character Sets}
window provides three options.  The \agmenu{Partition Sets} option
displays the partition sets that cover the character set you have
selected.  The \agmenu{Set Elements} window shows the composition of
the selected character set.  If the character set corresponds to a
token in your grammar, the \agmenu{Token Usage} window will show all
rules in your grammar where the token is used.

% XXX provides -> generates?
\paragraph{Partition Sets.}\index{Partition Sets}\index{Window} 
There are two Partition Sets windows available.  From the
\agmenu{Window} menu, the \agmenu{Partition Sets} option provides a
list of all the sets that cover the character universe.  From the
\agmenu{Auxiliary Windows} menu for the \agwindow{Character Sets}
table, the \agmenu{Partition Sets} option provides a list of the sets
that cover the selected character set.  In this case, the character
set number appears on the title bar of the \agwindow{Partition Sets}
window.

Each line of a \agwindow{Partition Sets} window describes a particular
set in the covering.  The description has the following fields:

\begin{itemize}
\item the partition set number
\item the token number assigned to this set
\item the token name, if any, that corresponds to this set
\end{itemize}

Partition set zero is the set of all characters in the character
universe that your parser does not accept.  If one of the characters
in this set appears in the input to your parser, your parser will
signal a syntax error.  You should check this set to make sure it
conforms to your expectations.

The \agmenu{Auxiliary Windows} menu for the \agwindow{Partition Sets}
window provides two options: \agmenu{Set Elements} and \agmenu{Token
Usage}.  \agmenu{Set Elements} will display the characters which
comprise the partition set.  \agmenu{Token Usage} will display all the
rules in your grammar that use the token assigned to this set.  If
this particular partition set was developed by AnaGram because of an
overlap, it may not correspond precisely to any token in your grammar.
Under these circumstances there will be no explicit usage in your
grammar and \agmenu{Token Usage} will be greyed out in the
\agmenu{Auxiliary Windows} menu.
% ...but in that case shouldn't it show which *used* character sets it
% appears in?
% also, XXX: s/developed/generated/

\paragraph{Character Map.}\index{Character Map}\index{Window}
The \agwindow{Character Map} table shows you the mapping of input
characters to token numbers.  The \agcode{ag{\us}tcv} table in your
parser is based on the information in this table.

The fields in this table are, in order:

\begin{itemize}
\item \index{Character codes}character code
\item display character (if any)
\item partition set number
\item token number
\item token representation
\end{itemize}

The display character will be whatever Windows displays for this code
in the \agoption{Data Tables} font you have chosen.  If a character is
not used in your grammar the token number and token representation are
both \index{T000}\textit{T000}.  The \agmenu{Auxiliary Windows} popup
menu provides two options: \agmenu{Set Elements} and \agmenu{Token
Usage}.  \agmenu{Set Elements} shows the elements of the partition set
to which the selected character belongs.  The \agmenu{Token Usage}
window shows the rules in your grammar in which the token
corresponding to this partition set number is used.

\paragraph{Set Elements.}\index{Set Elements}\index{Window}

The Set Elements window can be accessed only through the
\agmenu{Auxiliary Windows} menu in a window that identifies a
character set, a partition set, or a terminal token.  The
\agwindow{Set Elements} window shows the numeric code and screen
representation for each element of the set.  The character set or
partition set number is displayed in the title bar of the window.  In
the case of a terminal token, the character set displayed is the
character set corresponding to the terminal token.

There is no \agmenu{Auxiliary Windows} menu defined for the
\agwindow{Set Elements} window.


\section{The Elements of Your Grammar}

In analyzing your syntax, AnaGram takes it completely apart and
creates an internal representation of it.  A number of the internal
tables are available for your inspection.

Two tables, the 
\index{Symbol Table}\index{Window}\index{Table}\agwindow{Symbol Table}
and the
\index{Token Table}\index{Window}\index{Table}\agwindow{Token Table},
identify the elementary constituents of your grammar.  A third, the
\index{Rule Table}\index{Window}\index{Table}\agwindow{Rule Table},
summarizes the grammar AnaGram has abstracted from your syntax file.
The \agwindow{Symbol Table} and \agwindow{Token Table} are not
equivalent since you may have named character sets which are not
tokens.  You may also have tokens which you have defined directly as
character sets or character ranges and therefore have no names.

A number of \agwindow{Auxiliary Windows} also provide useful
information about the tokens in your grammar.  These are the
\agwindow{Expansion Chain}, \agwindow{Expansion Rules},
\agwindow{Productions}, \agwindow{Rule Context}, \agwindow{Set
Elements}, and \agwindow{Token Usage} windows.

\paragraph{Symbol Table.}\index{Symbol Table}\index{Window}\index{Table} 
The \agwindow{Symbol Table} lists all the symbols you used in your
grammar.  Symbols may be used, of course, to identify tokens,
definitions, or virtual productions or to provide alternative names
for tokens.

Each line in this table identifies a single symbol.  The first field
is the token number, if any.  This is followed by the name.  If the
name was defined by a definition statement, it is followed by an equal
sign and the right side of the definition.  The \agmenu{Auxiliary
Windows} menu for the \agwindow{Symbol Table} has four options:
\agmenu{Expansion Rules}, \agmenu{Productions}, \agmenu{Set Elements},
and \agmenu{Token Usage}.  The \agmenu{Expansion Rules} and
\agmenu{Productions} windows exist only for symbols which name
nonterminal tokens.  The \agmenu{Set Elements} window exists only for
symbols which name character sets or terminal tokens.  The
\agmenu{Token Usage} table exists for any symbol which names a token.

\paragraph{Token Table.}\index{Token Table}\index{Window}\index{Table}
The \agwindow{Token Table} lists all the tokens of your grammar.  The
first field is the token number.  It is followed by a flag field which
is \textit{zl} if the token is a nonterminal token and is \index{Zero
length token}\index{Token}zero length.  If the token is nonterminal
and not zero length, the flag field contains \textit{nt}.  If the
token is a terminal token, the field is blank.  The next field is
blank unless the token has been declared \index{Sticky
declaration}\agparam{sticky} or has had  a precedence level assigned.
If the token is sticky, this field will contain \textit{s}.  If a
precedence level has been assigned, this field will contain the letter
\textit{l}, \textit{r}, or \textit{n} to indicate associativity
followed by the precedence level.  Finally there is the 
\index{Data type}\index{Token}data type
of the semantic value of this token and the token representation.

The \agmenu{Auxiliary Windows} menu for the \agwindow{Token Table} has
four options: \agmenu{Expansion Rules}, \agmenu{Productions},
\agmenu{Set Elements} and \agmenu{Token Usage}.  The \agmenu{Expansion
Rules} and \agmenu{Productions} windows exist only for nonterminal
tokens.  \agmenu{Set Elements} exists only for terminal tokens.

If you have used
\index{Disregard statement}\index{Statement}\agparam{disregard}
statements to cause white space or other uninteresting text to be
skipped in the input to your parser, many of your tokens will appear
in the \agwindow{Token Table} twice: once in the normal form and once
with the ``\%'' character appended.  For instance, if you have
specified that \agcode{space} be disregarded after \agcode{name},
there will be entries for both \agcode{name} and
\agcode{name\%}\index{ \_prc}.  In this case, \agcode{name\%}
represents the simple token and \agcode{name} represents
\agcode{name\%} followed by \agcode{space?...}

It is a good idea to check the \agwindow{Token} and \agwindow{Symbol}
tables from time to time to make sure that all the names are the ones
you intended and not the result of typographical errors.

\paragraph{Rule Table.}\index{Rule Table}\index{Window}\index{Table}
The \agwindow{Rule Table} lists, in numerical order, all the grammar
rules defined in your grammar.  Each rule is preceded by the
nonterminal tokens which produce it.  If you are not using
semantically determined productions, then there will be precisely one
token line per rule.  The \agwindow{Rule Table} is synched to your 
syntax file to show the rule in context.

The \agmenu{Auxiliary Windows} popup menu for the \agwindow{Rule
Table} has four options: \agmenu{Expansion Rules},
\agmenu{Productions}, \agmenu{Rule Context} and \agmenu{Token Usage}.
The \agmenu{Expansion Rules}, \agmenu{Productions} and \agmenu{Token
Usage} windows are keyed to the lines in the \agwindow{Rule Table}
which identify tokens.  AnaGram will beep if you select one of these
options while a rule is highlighted.  The \agmenu{Rule Context} window
is keyed to the highlighted rule, or, if a token is highlighted, the
next following rule.

\paragraph{Expansion Rules.}\index{Expansion Rules}\index{Window}
The \agwindow{Expansion Rules} window is available in the
\agmenu{Auxiliary Windows} menu from any window that identifies
tokens.  It displays a complete left expansion of the selected token
if the token is nonterminal.  That is, it is a list of rules that
begins with all the rules produced by the token, plus all the rules
produced by the first token of any rule in the list.  The token number
and the name or other representation of the token being expanded is
displayed on the title bar of the window.  The \agwindow{Expansion
Rules} window is synched with the syntax file window so you can see
each rule in context.

The \agwindow{Auxiliary Windows} available from the \agwindow{Expansion
Rules} window are the \agwindow{Expansion Chain}, \agwindow{Expansion
Rules}, \agwindow{Productions}, \agwindow{Rule Context}, \agwindow{Set
Elements}, and \agwindow{Token Usage} windows, all keyed to the marked
token in the highlighted rule.

\paragraph{Expansion Chain.}\index{Expansion Chain}\index{Window}
The \agwindow{Expansion Chain} window is available in the
\agmenu{Auxiliary Windows} menu from any window that contains
expansion rules, in particular, from the \agwindow{Expansion Rules}
window, from a \agwindow{Conflicts} or \agwindow{Rule Stack} window
(see Chapter 7), from a \agwindow{State Expansion} window (see below),
or even from an \agwindow{Expansion Chain} window. 
% XXX that last ``an'' should be ``another''

The purpose of an \agwindow{Expansion Chain} window is to show how a
particular expansion rule in a particular state derives from a
characteristic rule for that state.  To see a chain of productions
that produces a desired expansion rule, select the expansion rule with
the cursor bar, click the right mouse button for the
\agwindow{Auxiliary Windows} menu, and select \agmenu{Expansion
Chain}.  The \agwindow{Expansion Chain} window will then present a
sequence of expansion rules, using the same format as the
\agwindow{Expansion Rules} window, but subject to the constraint that
each rule is produced by the marked token in the previous line.  The
first rule in the window is a characteristic rule for the given state.
The last rule in the window is the rule selected by the cursor bar in
the window from which you chose the \agwindow{Expansion Chain}.  It
should be noted that this expansion is not unique.  There may be other
derivations.

\paragraph{Productions.}\index{Window}\index{Productions}
The \agwindow{Productions} window is available in the
\agmenu{Auxiliary Windows} popup menu from any window which identifies
tokens.  If the token selected by the cursor bar is a terminal token,
the \agmenu{Productions} option will be greyed out.  Otherwise, it
will show all the rules the given token produces.  The
\agwindow{Productions} window  is synched with the syntax file window
so you can see each rule in context.  The \agwindow{Productions}
window does not have an \agmenu{Auxiliary Windows} menu.

\paragraph{Reduction Procedures.}
The \agwindow{Reduction Procedures} window lists the C function
prototypes for the reduction procedures in your grammar.  When this
window is active, the syntax file window, if visible, is synchronized
with it so you can see the body of the reduction procedure as well as
its usage.

\paragraph{Rule Context.}\index{Rule Context}\index{Window}
The \agwindow{Rule Context} window is available in the
\agmenu{Auxiliary Windows} popup menu from any window which identifies
rules.  When you select the \agwindow{Rule Context} window, AnaGram
finds all the tokens which produce the selected rule and then finds
all rules in your grammar which use any of these rules.

The \agmenu{Auxiliary Windows} menu for the \agwindow{Rule Context}
window offers five options: \agmenu{Expansion Rules},
\agmenu{Productions}, \agmenu{Rule Context}, \agmenu{Set Elements} and
\agmenu{Token Usage}.  The \agmenu{Rule Context} option is keyed to
the highlighted rule in the original \agwindow{Rule Context} window.
The remaining windows are keyed to the marked token in the highlighted
rule in the original \agwindow{Rule Context} window.

\paragraph{Token Usage.}\index{Token Usage}\index{Window}
The \agwindow{Token Usage} window is available in the
\agmenu{Auxiliary Windows} popup menu from any window which identifies
tokens.  It displays all rules in your grammar which use the specified
token.  The rules are displayed with a marked token, and
\agwindow{Auxiliary Windows} accessed from the \agwindow{Token Usage}
window will be keyed to the marked token which is the one following
the specified token.
% XXX the above sentence needs to be shot

The \agmenu{Auxiliary Windows} menu for the \agwindow{Token Usage}
window offers five options: \agmenu{Expansion Rules},
\agmenu{Productions}, \agmenu{Rule Context}, \agmenu{Set Elements} and
\agmenu{Token Usage}.  The \agmenu{Rule Context} option is keyed to the
highlighted rule in the \agwindow{Token Usage} window.  The remaining
windows are keyed to the marked token in the highlighted rule in the
original \agwindow{Token Usage} window.


\section{State Tables}
\index{State}

When AnaGram analyzes your grammar, the principal result is the
definition of parser states.  AnaGram provides one table, the
\agwindow{State Definition} table, listed in the \agmenu{Window} menu,
which describes all the states in the parser.  It also provides a
number of \agwindow{Auxiliary Windows} that show the relationships
among the states and the elements of your grammar.

\paragraph{State Definition Table.}
\index{State Definition Table}\index{Window}\index{Table}\index{Table}
The \agwindow{State Definition Table} lists the rules which define the
states of your parser.  Each line contains the state number, which is
blank if it is the same as the state number of the previous line, the
rule number and finally the rule itself.  The cursor in the syntax
file window is synched with the cursor bar to show the grammar rule in
context.

Each state is defined by one or more rules, displayed with a marked
token\index{Rule}\index{Token}\index{Marked rule} in a distinctive
font.  The meaning of the marked token is this: If your parser is in
this state, it has accumulated, in the input buffer, all of the tokens
in the rule that are to the left of the marked token.  Further input
must be consistent with the marked token in one or the other of the
defining rules.  If there is no marked token, the rule is a completed
rule, and an appropriate lookahead token will cause the rule to be
reduced.

The marked rules that define a particular state of a parser are
sometimes called the
\index{Characteristic rules}\index{Rules}\agterm{characteristic rules}
of the state.

The \agmenu{Auxiliary Windows} menu for the \agwindow{State Definition
Table} offers ten choices. Four are keyed to the marked token in the
highlighted rule: \agwindow{Expansion Rules}, \agwindow{Productions},
\agwindow{Set Elements}, and \agwindow{Token Usage}.  One,
\agwindow{Rule Context}, is keyed simply to the highlighted rule.
Four are keyed only to the highlighted state: \agwindow{Auxiliary
Trace}, \agwindow{Keywords}, \agwindow{Previous States}, and
\agwindow{State Expansion}.  One, \agwindow{Reduction States}, is
keyed to the combination of rule number and state number.  It is
available only for completed rules.

\paragraph{State Definition.}
\index{State Definition}\index{Window}
For some windows which identify a state but do not show its
definition, the Auxiliary Windows menu contains an entry to display
the \index{Characteristic rules}\index{Rules}characteristic rules
which identify the state.  The state number is displayed on the title
bar of the window.

The \agwindow{Auxiliary Windows} choices for a \agwindow{State
Definition} window are the same as for the general \agwindow{State
Definition Table}.

\paragraph{Auxiliary Trace.}
\index{Auxiliary Trace}\index{Window}\index{Trace}
\agwindow{Auxiliary Trace} windows may be accessed through the
\agmenu{Auxiliary Windows} menu.  The \agwindow{Auxiliary Trace} is a
prebuilt \agwindow{Grammar Trace} showing one of perhaps many ways to
get to the state identified by the cursor bar in the parent window.
See Chapter 5 for a discussion of the \agwindow{Grammar Trace}.

\paragraph{Keywords.}\index{Keywords}\index{Window}
When you select \agmenu{Keywords} in the \agmenu{Window} menu, AnaGram
displays a list of all the keywords defined in your grammar together
with the token numbers assigned to them.  When you select
\agmenu{Keywords} in an \agmenu{Auxiliary Windows} menu, AnaGram
displays a list of keywords which your parser will identify in the
state determined by the cursor in the parent window.  It displays all
the keywords the parser will recognize in that state, regardless of
whether they are used as shift or as reducing tokens.  The state
number is displayed on the title bar of the window.

The \agmenu{Auxiliary Windows} menu for a \agwindow{Keywords Window}
has only one option: \agmenu{Token Usage}, so you can see all uses of
a given keyword in your grammar.

\paragraph{Previous States.}\index{Previous States}\index{Window}
A \agwindow{Previousf States} window can be accessed via the
\agmenu{Auxiliary Windows} menu from any window which identifies
parser states.  It shows the defining rules for all the states which
jump to the specified state.  The \agmenu{Auxiliary Windows} options for a
\agwindow{Previous States} window are the same as for the
\agwindow{State Definition Table}.

\paragraph{Reduction States.}\index{Window}\index{Reduction States}
A \agwindow{Reduction States} window can be accessed via the
\agmenu{Auxiliary Windows} menu from most windows which display marked
rules with a specified state.  In this case, if the highlighted rule
is complete, that is, there is no marked token, the \agmenu{Reduction
States} option will show you all the possible states the parser could
go to on reducing the rule.  The actual state your parser will go to
depends on the actual sequence of tokens which brought your parser to
the state you are investigating.  The \agwindow{Reduction States}
window is very useful in understanding conflicts and keyword
anomalies.  A special version of this window, called \agwindow{Problem
States}, is available from the \agmenu{Auxiliary Windows} menu of the
\agwindow{Conflicts} window.

The \agmenu{Auxiliary Windows} options for a \agwindow{Reduction
States} window are the same as for the \agwindow{State Definition
Table}.

\paragraph{State Expansion.}\index{State Expansion}\index{Window}
The \agwindow{State Expansion} window may be accessed using the
\agmenu{Auxiliary Windows} menu from any window that identifies a
state.  It shows the complete set of expansion rules for the state,
consisting of the union of the set of characteristic rules and the
sets of expansion rules for the token to the right of the mark in each
characteristic rule.  The \agwindow{State Expansion} window shows all
possible legal input to your parser in the given state.  The state
itself is identified on the title bar of the window.

The \agmenu{Auxiliary Windows} menu for the \agwindow{State Expansion}
window has two options keyed to the state number: \agmenu{Auxiliary Trace} and
\agmenu{Previous States}; two options keyed to the state number and rule
number: \agmenu{Expansion Chain} and \agmenu{Reduction States}; four
options keyed to the marked token in the highlighted rule:
\agmenu{Expansion Rules}, \agmenu{Productions}, \agmenu{Set Elements},
and \agmenu{Token Usage}; and one option keyed simply to the
highlighted rule: \agmenu{Rule Context}.


\section{Coverage Analysis Tables}

AnaGram provides two tables on the \agmenu{Browse} menu,
\agwindow{Rule Coverage} and \agwindow{Trace Coverage}, to let you see
which grammar rules have been identified by your parser in the course
of your testing.  \agwindow{Rule Coverage} is described in Chapter 9,
Programming With AnaGram.  \agwindow{Trace Coverage} is described in
Chapter 5, Exploring Your Grammar I: Traces.