AnaGram interim repo (temporary): doc/misc/html/gloss.html comparison

comparison doc/misc/html/gloss.html @ 0:13d2b8934445

Import AnaGram (near-)release tree into Mercurial.

author	David A. Holland
date	Sat, 22 Dec 2007 17:52:45 -0500
parents
children

comparison

equal deleted inserted replaced

--1:000000000000
+:13d2b8934445
+<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
+<HTML>
+<HEAD>
+<TITLE>Glossary of Parsing Terms</TITLE>
+</HEAD>
+<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
+TEXT="#000000" LINK="#0033CC"
+VLINK="#CC0033" ALINK="#CC0099">
+<P>
+<IMG ALIGN="right" SRC="images/agrsl6c.gif" ALT="AnaGram"
+WIDTH=124 HEIGHT=30>
+<BR CLEAR="RIGHT">
+</P>
+Back to <A HREF="index.html">Index</A>
+<P>
+<IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------"
+WIDTH=1010 HEIGHT=2  >
+<P>
+<H2>Glossary of Parsing Terms</H2>
+<IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------"
+WIDTH=1010 HEIGHT=2 >
+</P>
+<P>
+<TABLE WIDTH="100%" ALIGN="center" CELLPADDING=15 CELLSPACING=15>
+<TR ALIGN="left">
+<TD VALIGN=TOP NOWRAP WIDTH="50%">
+<A HREF="#ParserAction">Action</A><BR>
+<A HREF="#AcceptAction">Accept Action</A><BR>
+<A HREF="#Associativity">Associativity</A><BR>
+<A HREF="#Backtracking">Backtracking</A><BR>
+<A HREF="#BNF">Backus-Naur Form</A><BR>
+<A HREF="#BinaryOperator">Binary operator</A><BR>
+<A HREF="#Binding">Binding</A><BR>
+<A HREF="#CharacterSets">Character sets</A><BR>
+<A HREF="#Universe">Character Universe</A><BR>
+<A HREF="#CharacteristicRule">Characteristic Rule</A><BR>
+<A HREF="#CharacteristicToken">Characteristic Token</A><BR>
+<A HREF="#SetComplement">Complement of a set</A><BR>
+<A HREF="#CompletedRule">Completed Rule</A><BR>
+<A HREF="#Conflict">Conflict</A><BR>
+<A HREF="#ContextFreeGrammar">Context Free Grammars</A><BR>
+<A HREF="#DefaultReduction">Default reductions</A><BR>
+<A HREF="#SetDifference">Difference of two sets</A><BR>
+<A HREF="#ErrorAction">Error Action</A><BR>
+<A HREF="#EventDriven">Event Driven Parser</A><BR>
+<A HREF="#ExpansionRule">Expansion Rule</A><BR>
+<A HREF="#Grammar">Grammar</A><BR>
+<A HREF="#GrammarAnalysis">Grammar Analysis</A><BR>
+<A HREF="#GrammarRule">Grammar Rule</A><BR>
+<A HREF="#GrammarToken">Grammar Token</A><BR>
+<A HREF="#SetIntersection">Intersection</A><BR>
+<A HREF="#LALRParsing">LALR parsing</A><BR>
+<A HREF="#LexicalScanner">Lexical Scanner</A><BR>
+<A HREF="#Lookahead">Lookahead Token</A><BR>
+<A HREF="#LRParsing">LR Parsing</A><BR>
+<A HREF="#MarkedRule">Marked Rule</A><BR>
+<A HREF="#NonAssociative">Non-associative</A><BR>
+</TD>
+<TD VALIGN=TOP NOWRAP WIDTH="50%">
+<A HREF="#Nonterminal">Nonterminal Token</A><BR>
+<A HREF="#NullProduction">Null Production</A><BR>
+<A HREF="#ParameterAssignment">Parameter Assignment</A><BR>
+<A HREF="#Parser">Parser</A><BR>
+<A HREF="#ParserGenerator">Parser Generator</A><BR>
+<A HREF="#ParserStateStack">Parser State Stack</A><BR>
+<A HREF="#ParserValueStack">Parser Value Stack</A><BR>
+<A HREF="#ParsingEngine">Parsing Engine</A><BR>
+<A HREF="#SetPartition">Partition</A><BR>
+<A HREF="#Precedence">Precedence</A><BR>
+<A HREF="#Production">Production</A><BR>
+<A HREF="#ReduceAction">Reduce Action</A><BR>
+<A HREF="#ReduceReduce">Reduce-Reduce Conflict</A><BR>
+<A HREF="#ReducingToken">Reducing Token</A><BR>
+<A HREF="#ReductionProcedure">Reduction Procedure</A><BR>
+<A HREF="#ReductionToken">Reduction Token</A><BR>
+<A HREF="#Resynchronization">Resynchronization</A><BR>
+<A HREF="#RuleElement">Rule Element</A><BR>
+<A HREF="#SemanticAction">Semantic Action</A><BR>
+<A HREF="#SemanticValue">Semantic Value</A><BR>
+<A HREF="#SemanticallyDetermined">Semantically Determined Production</A><BR>
+<A HREF="#SetExpression">Set Expression</A><BR>
+<A HREF="#ShiftAction">Shift Action</A><BR>
+<A HREF="#ShiftReduce">Shift-Reduce Conflict</A><BR>
+<A HREF="#SyntaxDirectedParsing">Syntax Directed Parsing</A><BR>
+<A HREF="#Terminal">Terminal Token</A><BR>
+<A HREF="#Token">Token</A><BR>
+<A HREF="#UnaryOperator">Unary Operator</A><BR>
+<A HREF="#SetUnion">Union</A><BR>
+<A HREF="#VirtualProduction">Virtual Production</A><BR>
+</TD>
+</TABLE>
+<HR>
+<DL>
+<DT><B><A NAME="ParserAction">Action</A></B></DT>
+<DD>A "parser action" is one of the basic executable elements of a
+<A HREF="#ParsingEngine">parsing engine</A>. In a traditional parsing engine
+there are four actions: the
+<A HREF="#ShiftAction">shift action</A>, the <A HREF="#ReduceAction">reduce
+action</A>, the <A HREF="#AcceptAction">accept action</A>, and the
+<A HREF="#ErrorAction">error action</A>. The parsing engines which AnaGram
+generates use a substantially greater number of actions, in order to provide
+certain short cuts in the interest of greater speed and greater compactness of
+the tables which control the action of the parsing engine.
+<P></P>
+</DD>
+<DT><B><A NAME="AcceptAction">Accept Action</A></B></DT>
+<DD> The accept <A HREF="#ParserAction">action</A> is one of the four actions
+of a traditional <A HREF="#ParsingEngine">parsing engine</A>. The accept action
+is performed when the <A HREF="#Parser">parser</A> has succeeded in identifying
+the <A HREF="#GrammarToken">goal token</A> for the <A HREF="#Grammar">grammar</A>.
+When the parser executes the accept action, it returns to the calling program.
+The accept action is thus the last action of
+the parsing engine and occurs only once for each successful execution
+of the parser.
+<P></P>
+</DD>
+<DT><B><A NAME="Associativity">Associativity</A></B></DT>
+<DD> This is a term used to describe <A HREF="#BinaryOperator">binary
+operators </A>. It describes how you interpret a sequence of operations which
+all involve the same operator. Consider a - b - c, for instance. Here the
+convention is that we subtract b from a, and then c from the difference. We say
+that subtraction is <B>left associative</B>, since if there is a sequence of
+subtraction operations, the leftmost one is to be performed first. As a second
+example, consider exponentiation. FORTRAN uses ** as an operator to indicate
+exponentiation. In order to conform as closely as possible to ordinary
+mathematical notation, the expression a**b**c is interpreted to mean that first
+b is raised to the power c, and then the resultant value is used as the power to
+which a is raised. We say that exponentiation is <B>right associative</B> since the
+rightmost of a sequence of exponentiation operations is to be performed first.
+If a programming language does not allow for unparenthesized sequences of a
+particular operator, the operator is said to be non-associative.<P>
+Another way to view left and right associativity is as implied parenthesization.
+Left associative operators are parenthesized from the left. Right associative
+operators are parenthesized from the right. Thus a - b - c = ((a-b) - c) and
+a**b**c = (a**(b**c))</P><P>
+AnaGram offers two ways to specify associativity of
+<A HREF="#BinaryOperator">binary operators</A>. The first
+way is to use conventional <A HREF="#BNF">Backus-Naur Form</A> syntax
+descriptions. You would userecursion to describe both subtraction and
+exponentiation. Subtraction, since it is left associative, uses left recursion,
+and exponentiation, being right associative, uses right recursion. Thus</P><P>
+</P><PRE>
+difference  -&gt; value | difference, '-', value
+exponential -&gt; value | value, "**", exponential
+</PRE>  could be used to describe differences and exponents.<P>
+The second way to specify associativity is to use an ambiguous <A HREF="#Grammar">grammar</A> and
+precedence declarations. (See Chapter 9, AnaGram User's Guide.)</P><P>
+</P></DD>
+<DT><B><A NAME="Backtracking">Backtracking</A></B></DT>
+<DD> In order to make parse tables more compact and <A HREF="#Parser">parsers</A> faster, it is
+common to use <A HREF="#DefaultReduction">default reductions</A>. In case of
+error, it is necessary to undo default reductions before diagnostics can be
+properly determined. In AnaGram, this undo operation is called backtracking.<P>
+</P></DD>
+<DT><B><A NAME="BNF">Backus-Naur Form</A></B></DT>
+<DD> Backus-Naur Form, or BNF, is a conventional notation for describing
+<A HREF="#ContextFreeGrammar">context free grammars</A>. AnaGram uses an
+extended notation for <A HREF="#Grammar">grammars</A>, which, except for
+<A HREF="#SemanticallyDetermined">semantically determined
+productions</A>, can be shown to be equivalent to BNF. The term "BNF" is
+often used colloquially to refer to a grammar specification.<P>
+AnaGram's syntax specification language differs from BNF in the following
+respects:
+</P><OL>
+<LI>In conventional BNF, a symbol not enclosed in angle brackets (&lt; &gt;) is
+taken to represent itself literally. In AnaGram, literal character
+representations must be enclosed in single quotes and literal strings within
+double quotes.
+</LI>
+<LI>In BNF, the right side of a <A HREF="#Production">production</A> is simply a list of symbols without
+any delimiters or punctuation. AnaGram requires that the <A HREF="#RuleElement">rule elements</A> which
+make up a <A HREF="#GrammarRule">grammar rule</A>, or right side, be joined by commas.
+</LI>
+<LI>BNF makes no provision for identifying <A HREF="#ReductionProcedure">reduction procedures</A> or their
+arguments. AnaGram provides both reduction procedures, introduced by an "="
+at the end of the production, and named arguments, introduced by a ":"
+at the end of any <A HREF="#Token">token</A> on the right side of the production.
+</LI>
+<LI>AnaGram allows <A HREF="#VirtualProduction">virtual productions</A> to be used freely.
+</LI>
+<LI>BNF is "pure" in that if you wish to define a <A HREF="#Nonterminal">nonterminal token</A>
+called "digit" to represent the digits from zero to nine, you must
+provide ten explicit productions to define it. AnaGram treats the concept of <A HREF="#Terminal">"terminal
+token"</A> as used in language theory as an abstraction, and interposes
+<A HREF="#CharacterSets">character set</A> functionality between actual character input and the terminal
+tokens of BNF. You can define digit to be the character range '0-9', for
+instance, and AnaGram will determine and define precisely the minimum number of
+productions necessary, taking into account any other usage of the characters
+'0-9' in your grammar. This makes your grammar more compact, more manageable,
+and easier to modify.
+</LI>
+<LI>AnaGram allows for <A HREF="#SemanticallyDetermined">
+semantically determined productions</A>, which provide a
+significant mechanism for melding semantic analysis with syntactic
+analysis.
+</LI>
+</OL>
+</DD>
+<DT><B><A NAME="BinaryOperator">Binary operator</A></B></DT>
+<DD> A binary operator is an operator that works on two operands to create a
+result. It is often written between its operands and is sometimes called an
+infix operator. In ordinary programming languages "+" and "-"
+are binary operators.
+<P></P>
+</DD>
+<DT><B><A NAME="Binding">Binding</A></B></DT>
+<DD> Most programming languages, rather than executing arithmetic operators in
+simple left to right order, conform to the traditional conventions of ordinary
+algebra, which dictate that, except where parenthesis indicate otherwise,
+exponentiation is done before multiplication, multiplication before addition,
+and addition before comparison. One says that exponentiation is "more
+tightly binding" than multiplication, multiplication is more tightly
+binding than addition, and so on. The sense of the word here is that the
+operator binds together its operands into a single entity. An operand which
+falls between two different operators is "bound" by the more tightly
+binding operator. An operator that is more tightly binding than another is also
+said to have higher <A HREF="#Precedence">precedence</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="CharacterSets">Character sets</A></B></DT>
+<DD> One of the traditional problems with syntax directed programming is that
+caused by the fact that most formal language specifications have productions of
+the form:
+<PRE>
+letter -&gt; a | b | c | d ... | z
+</PRE>  Since the letters are not often distinguished in the <A HREF = "#Grammar">grammar</A>, this large
+number of essentially identical <A HREF="#Production">productions</A> causes a correspondingly large
+number of states in the <A HREF="#Parser">parser</A>. This problem is often attacked by using a<A
+HREF="#LexicalScanner"> "lexical scanner"</A> which simply specifies
+a "letter token" whose value indicates precisely which letter is
+intended. This works fine as long as there is nothing in the grammar which
+distinguishes any letter at all. If there is, and there is usually some special
+use of h, x, q, o, e, or even a-f, the situation gets more complicated. Often the
+result is to abandon the use of syntax directed programming for elemental units
+of the language and use a more complex lexical scanner which identifies names,
+numbers, key words and operators. Such lexical scanners are often built using
+conventional programming languages or simple pattern recognizing languages such
+as LEX.<P>
+AnaGram avoids this problem by incorporation the notion of character set into
+its input specifications. Briefly, the way it works is the following: You
+specify the set of characters which make up any ostensibly <A HREF="#Terminal">terminal token</A>.
+AnaGram then analyzes the overlaps among these definitions and creates a minimal
+set of <A HREF="#Token">tokens</A> which match your specifications exactly. For instance, suppose you
+have the following definitions:
+</P><PRE>
+letter = 'a-z' + 'A-Z'
+hex digit = '0-9' + 'a-f' + 'A-F'
+</PRE>  AnaGram will define a token for letter which has two productions:
+<PRE>
+letter -&gt; 'a-f' + 'A-F' | 'g-z' + 'G-Z'
+</PRE>  With this technique, you have the freedom to specify your grammar in the
+easiest possible manner, without the penalty of an absurdly large, slow parser.<P>
+Character sets may be specified in terms of ranges of characters, as in the
+above example, as <A HREF="#SetUnion">unions</A>, denoted by "+", as
+<A HREF="#SetIntersection">intersections</A>, denoted by "&amp;", as
+<A HREF="#SetDifference">differences,</A> denoted by "-", and as
+<A HREF="#SetComplement">complements</A>, using "~". You may also
+specify a set consisting of a single character either with a literal character
+or with a numeric specification. If you specify a character numerically, you may
+use either decimal, octal or hexadecimal notation, as in C.
+</P><P></P>
+</DD>
+<DT><B><A NAME="Universe">Character Universe</A></B></DT>
+<DD> In ordinary set theory, the "universe" is the set of all
+entities which may be elements of a set. It must be defined so that the
+complement of a set can mean something unambiguous. In AnaGram, the character
+universe normally comprises the ascii characters and the IBM extended character
+set, that is, all characters in the range from 0 through 255. If, however, you
+have defined input characters outside this range, the character universe will be
+extended to the least character you have used and to the greatest character you
+have used in your <A HREF = "#Grammar">grammar</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="CharacteristicRule">Characteristic Rule</A></B></DT>
+<DD> Each parser state is characterized by a particular set of <A HREF="#GrammarRule">grammar rules</A>,
+and for each such rule, a marked token which is the next <A HREF="#Token">token</A> expected. These
+rules are called the characteristic rules for the state. In the course of doing
+<A HREF="#GrammarAnalysis">grammar analysis</A>, AnaGram determines the characteristic rules for each parser
+state. After analyzing your grammar, you may inspect the State Definition Table
+to see the characteristic rules for any state in your <A HREF="#Parser">parser</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="CharacteristicToken">Characteristic Token</A></B></DT>
+<DD> Every state in a <A HREF="#Parser">parser</A>, except state zero, can be characterized by the
+one, unique <A HREF="#Token">token</A> which causes a jump to that state. That token is called the
+characteristic token of the state, because to get to that parser state you must
+have just seen precisely that token in the input. Note that several states could
+have the same characteristic token.<P>
+When you have a list of states, such as is given by the parser state stack, it
+is equivalent to a list of characteristic tokens. This list of tokens is the
+list of tokens that have been recognized so far by the parser. Some of these
+tokens, of course, may be <A HREF="#Nonterminal">nonterminal tokens</A> and may thus represent the result
+of reducing a sequence of previously recognized tokens.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="SetComplement">Complement of a set</A></B></DT>
+<DD> In set theory, the complement of a set, S, is the collection of all
+elements of the <A HREF="#Universe">character universe</A> which are not elements of S. Using AnaGram's
+notation, the complement of S is written ~S. The complement operator has higher
+<A HREF="#Precedence">precedence</A> than the
+<A HREF="#SetDifference">difference</A>, <A HREF="#SetIntersection">intersection</A>, or <A HREF="#SetUnion">union</A> operators.
+<P></P>
+</DD>
+<DT><B><A NAME="CompletedRule">Completed Rule</A></B></DT>
+<DD> A "completed rule" is a <A HREF="#CharacteristicRule">characteristic rule</A> whose index is
+pointing beyond the last <A HREF="#RuleElement">rule element</A>. In other words, it has been completely
+matched and will be <A HREF="#ReduceAction">reduced</A> by the next input.
+<P>
+If a state has more than one completed rule, the decision as to which to reduce
+is made based on the next input <A HREF="#Token">token</A>. If there is only one completed rule in a
+state, it will be reduced by default unless the default reductions switch has
+been reset, i.e., turned off.
+</P></DD>
+<DT><B><A NAME="Conflict">Conflict</A></B></DT>
+<DD> Conflicts arise during a <A HREF="#GrammarAnalysis">grammar analysis</A> when AnaGram cannot determine
+how to treat a given input <A HREF="#Token">token</A>. There are two sorts of conflicts: <A HREF="#ShiftReduce">shift-reduce
+conflicts</A> and <A HREF="#ReduceReduce">reduce-reduce conflicts</A>. Conflicts may arise either because the
+<A HREF = "#Grammar">grammar</A> is inherently ambiguous, or simply because the grammar analyzer cannot
+look far enough ahead to resolve the conflict. In the latter case, it is often
+possible to rewrite the grammar in such a way as to eliminate the conflict. <A HREF="#NullProduction">Null
+productions</A> are a common source of conflict.
+<P>
+There are a number of ways to deal with conflicts. If you understand the
+conflict well, you may simply choose to ignore it. When AnaGram encounters a
+shift-reduce conflict while building parse tables it resolves it by choosing the
+<A HREF="#ShiftAction">shift action</A>. When AnaGram encounters a reduce-reduce conflict while building
+parse tables, it resolves it by selecting the <A HREF="#GrammarRule">grammar rule</A> which occurred first
+in the grammar.
+</P><P>
+A second way to deal with conflicts is to set operator <A HREF="#Precedence">precedence</A> parameters. If
+you set these parameters, AnaGram will use them preferentially to resolve
+conflicts. Any conflicts so resolved will be listed in the Resolved Conflicts
+window.
+</P><P>
+A third way to resolve a conflict is to declare some tokens as sticky or as
+subgrammars. This is particularly useful for <A HREF="#Production">productions</A> whose sole purpose is
+to skip over uninteresting input.</P><P>
+</P><P>
+The fourth way to deal with conflicts is to rewrite the grammar to eliminate
+them. Many people prefer this approach since it yields the highest level of
+confidence in the resulting program.
+</P></DD>
+<DT><B><A NAME="ContextFreeGrammar">Context Free Grammars</A></B></DT>
+<DD> Context free grammars are <A HREF = "#Grammar">grammars</A> wherein the definition of a grammar
+unit, or <A HREF="#Nonterminal">nonterminal token</A>, does not depend on the context in which the
+nonterminal is used. That means that if you define "widget" in a
+certain way, that definition is valid in all contexts in which "widget"
+might occur. Context free grammars can be represented in <A HREF="#BNF">Backus-Naur Form</A>.
+AnaGram's syntax specification language has no facility for representing
+grammars which are not context free.
+<P></P>
+</DD>
+<DT><B><A NAME="DefaultReduction">Default reductions</A></B></DT>
+<DD> A "default reduction" is a parser action which may be used in
+your <A HREF="#Parser">parser</A> in any state which has precisely one <A HREF="#CompletedRule">completed rule</A>.
+<P>
+If a given parser state has, among its <A HREF="#CharacteristicRule">characteristic rules</A>, exactly one
+completed rule, it is usually faster to reduce it on any input than to check
+specifically for correct input before reducing it. The only time this default
+reduction causes trouble is in the event of a syntax error. In this situation
+you may get an erroneous reduction. Normally when you are parsing a file, this
+is inconsequential because you are not going to continue performing
+<A HREF="#SemanticAction">semantic actions</A> in the presence of
+errors. But, if you are using your parser to handle real-time
+interactive input, you have to be able to continue semantic
+processing after notifying your user that he has entered erroneous
+input. In this case you would want default reductions to have been
+turned off so that <A HREF="#Production">production</A>s are reduced
+only when there is correct input.
+</P></DD>
+<DT><B><A NAME="SetDifference">Difference of two sets</A></B></DT>
+<DD> In set theory, the difference of two sets, A and B, is defined to be the
+set of all elements of A that are not elements of B. In an AnaGram syntax file,
+you represent the difference of two <A HREF="#CharacterSets">character sets</A> by using the "-"
+operator. Thus the difference of A and B is A - B. The difference operator is
+<A HREF="#Associativity">left associative</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="ErrorAction">Error Action</A></B></DT>
+<DD> The error action is one of the four <A HREF="#ParserAction">actions</A> of a traditional <A HREF="#ParsingEngine">parsing
+engine</A>. The error action is performed when the <A HREF="#Parser">parser</A> has encountered an input
+<A HREF="#Token">token</A> which is not admissible in the current state. The further behavior of a
+traditional parser is undefined.
+<P></P>
+</DD>
+<DT><B><A NAME="EventDriven">Event Driven Parser</A></B></DT>
+<DD>
+An <I>event driven</I> <A HREF="#Parser">parser</A> is one in which
+the relationship between the host program and the parser is turned
+inside out. In a conventional parser, the host program calls the
+parser, the parser analyzes the complete input text and returns to
+the host program only when it has finished with the entire input.
+<P>In an event driven parser, the parser does not read its input
+directly from a file or from memory. Instead, the host program, after
+initializing the parser, calls it once for each input token. Each
+time the parser is called, it updates its state
+appropriately, calls any <A HREF="#ReductionProcedure">reduction
+procedures</A> that need to be called and finally when it needs more
+input, returns to the host program. The effect is that parsing can
+occur in parallel with other processing performed by the host
+program. This technique is especially useful in situations where the
+token stream to be parsed is developed on the fly, as when using <A
+HREF="#LexicalScanner">lexical scanners</A>, for instance.
+<P></P>
+</DD>
+<DT><B><A NAME="ExpansionRule">Expansion Rule</A></B></DT>
+<DD> In analyzing a <A HREF = "#Grammar">grammar</A>, we are often interested in the full range of input
+that can be expected at a certain point. The expansion of a <A HREF="#Token">token</A> or state shows
+us all the expected input. An expansion yields a set of
+<A HREF="#MarkedRule">marked rules</A>. Looking
+at the marked token shows us what input to expect.
+<P>
+The set of expansion rules of a <A HREF="#Nonterminal">nonterminal token</A> shows all the expected input
+that can occur whenever the token appears in the grammar. The set consists of
+all the <A HREF="#GrammarRule">grammar rules</A> produced by the token, plus all the rules produced by the
+first token of any rule in the set. The marked tokens for the expansion rules of a
+token are always the first element in the rule.
+</P><P>
+The expansion of a state consists of its
+<A HREF="#CharacteristicRule">characteristic rules</A> plus the expansion
+rules of the marked token in each characteristic rule.
+</P>
+</DD>
+<DT><B><A NAME="Grammar">Grammar</A></B></DT>
+<DD> Traditionally, a grammar is a set of <A HREF="#Production">productions</A> which taken together
+specify precisely a set of acceptable input sequences in terms of an abstract
+set of <A HREF="#Terminal">terminal tokens</A>. The set of acceptable input sequences is often called
+the "language" defined by the grammar.
+<P>
+In AnaGram, the term grammar also includes configuration segments, definitions
+of <A HREF="#CharacterSets">character sets</A>, <A HREF="#VirtualProduction">virtual productions</A>, etc. that augment the collection of
+productions. A grammar is often called a "syntax", and the <A HREF="#GrammarRule">rules of
+the grammar</A> are often called syntactic rules.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="GrammarAnalysis">Grammar Analysis</A></B></DT>
+<DD> The major function of AnaGram is the analysis of <A HREF = "#ContextFreeGrammar">context free grammars</A>
+written in a particular variant of <A HREF="#BNF">Backus-Naur Form</A>.
+<P>
+The analysis of a <A HREF="#Grammar">grammar</A> proceeds in four stages. In the first, the input
+grammar is analyzed and a number of tables are built which describe all of the
+<A HREF="#Production">productions</A> and components of the grammar.
+</P><P>
+In the second stage, AnaGram analyzes all of the <A HREF="#CharacterSets">character sets</A> defined in the
+grammar, and where necessary, defines auxiliary <A HREF="#Token">tokens</A> and productions.
+</P><P>
+In the third stage, AnaGram identifies all of the states of the <A HREF="#Parser">parser</A> and
+builds the go-to table for the parser.
+</P><P>
+In the fourth stage, Anagram identifies <A HREF="#ReductionToken">reduction tokens</A> for each completed
+<A HREF="#GrammarRule">grammar rule</A> in each state and checks for <A HREF="#Conflict">conflicts</A>.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="GrammarRule">Grammar Rule</A></B></DT>
+<DD> A "grammar rule" is the right side of a <A HREF="#Production">production</A>. It consists
+of a sequence of <A HREF="#RuleElement">rule elements</A>. Each rule element identifies some <A HREF="#Token">token</A>, which
+can be either a <A HREF="#Terminal">terminal token</A> or <A HREF="#Nonterminal">nonterminal token</A>. It is "matched"
+by a corresponding sequence of tokens in the input stream to the <A HREF="#Parser">parser</A>. The
+left side of the production identifies one or more nonterminal tokens, or
+<A HREF="#ReductionToken">reduction tokens</A>, to which the rule reduces when matched. If there is more than
+one reduction token, there should be a <A HREF="#ReductionProcedure">reduction procedure</A> to make the choice.
+<P></P>
+</DD>
+<DT><B><A NAME="GrammarToken">Grammar Token</A></B></DT>
+<DD> The grammar <A HREF="#Token">token</A> is the token which represents the "top level"
+in your grammar. Some people refer to it as the "goal" or "goal
+token" and others as the "start token". Whatever it is called, it
+is the single token which describes the complete input to your <A HREF="#Parser">parser</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="SetIntersection">Intersection</A></B></DT>
+<DD> In set theory, the intersection of two sets, A and B, is defined to be the
+set of all elements of A which are also elements of B. In an AnaGram syntax
+file, the intersection of two <A HREF="#CharacterSets">character sets</A> is represented with the "&amp;"
+operator. The intersection operator has lower <A HREF="#Precedence">precedence</A> than the complement
+operator, but higher precedence than the <A HREF="#SetUnion">union</A> and <A HREF="#SetDifference">difference</A> operators. The
+intersection operator is <A HREF="#Associativity">left associative</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="LALRParsing">LALR parsing</A></B></DT>
+<DD> LALR, or "lookahead LR" parsing, a slightly less powerful
+variant of <A HREF="#LRParsing">LR parsing</A>, produces much smaller parsing tables. Thus an LALR
+<A HREF="#Parser">parser</A> has very significant advantages in size over its LR counterpart. It is
+possible to construct <A HREF = "#Grammar">grammar</A>s which can be parsed by an LR parser but cannot be
+parsed by an LALR parser. That is, the LALR parser will find these grammars to
+be ambiguous and the LR parser will not. In practice, however, such grammars
+seem to be rare and the ambiguities can usually be eliminated by minor revisions.
+<P>
+AnaGram generates LALR parsers and, in addition, uses LALR parsing to interpret
+syntax files.
+</P><P></P>
+</DD>
+<DT><B><A NAME="LexicalScanner">Lexical Scanner</A></B></DT>
+<DD> A lexical scanner is a program or procedure used as a preprocessor for a
+<A HREF="#Parser">parser</A>. It scans input characters and lumps them together into <A HREF="#Token">tokens</A>. Many
+systems for generating parsers, most notably YACC, can scarcely be used without
+a lexical scanner.
+<P>
+AnaGram, although it can support the use of a lexical scanner, does not need a
+lexical scanner for most practical problems. AnaGram avoids the need for a
+lexical scanner by using <A HREF="#CharacterSets">character sets</A> and disregard and lexeme statements.
+</P><P>
+If your problem does in fact need a lexical scanner, you can use AnaGram itself
+to write it, so you don't need to know different languages for the scanner and
+for the parser.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="Lookahead">Lookahead Token</A></B></DT>
+<DD> The current input <A HREF="#Token">token</A> to a <A HREF="#Parser">parser</A> is often called the "lookahead"
+token. In many situations, it is not shifted into the input buffer immediately,
+but acts like a catalyst to cause a number of rules to be reduced before it is
+eventually shifted in.
+<P></P>
+</DD>
+<DT><B><A NAME="LRParsing">LR Parsing</A></B></DT>
+<DD> An LR(k) <A HREF="#Parser">parser</A> is a type of deterministic bottom-up shift-reduce parser
+using at most k lookahead symbols to determine its next <A HREF="#ParserAction">action</A> at any point in
+the parse. If (k) is omitted, then k is assumed to be 1. Discussion of the
+technical details of LR(k) parsing is beyond the scope of this manual. The
+interested reader may consult any of a variety of works covering the subject.
+<P>
+AnaGram produces parsers which employ a variant of LR parsing known as LALR
+parsing. It is not necessary to understand the details of LR and LALR parsing
+theory in order to use AnaGram.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="MarkedRule">Marked Rule</A></B></DT>
+<DD> A "marked rule" is a <A HREF="#GrammarRule">grammar rule</A>
+together with a "marked token"
+that indicates how much of the rule has already been matched and what input
+should be expected, starting with the marked token, if the remainder of the rule is to be matched. Each <A HREF="#Parser">parser</A>
+state is defined by a small number of <A HREF="#CharacteristicRule">characteristic rules</A> which indicate what
+matches have already been made and what input can be expected. Note that when a
+state has more than one characteristic rule, any two characteristic rules with
+the same number of <A HREF="#Token">tokens</A> to the left of the marked token
+match identically up to the
+marked token. If one rule has fewer tokens to the left than the other, all of its tokens
+preceding the marked token match the corresponding tokens of the longer rule
+exactly.
+<P>
+When marked rules are displayed in AnaGram windows, the marked token is displayed in
+a distinctive font which the developer can select. The default is italic.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="NonAssociative">Non-associative</A></B></DT>
+<DD> A <A HREF="#BinaryOperator">binary operator</A>, say #, is said to be non-associative if the sequence x
+# y # z is not permissible. If this sequence is to be interpreted as (x#y)#z,
+the operator is said to be <B>left associative</B>. If the sequence is to be
+interpreted as x#(y#z), the operator is said to be <B>right associative</B>. (See
+<A HREF="#Associativity">associativity</A>.)
+<P></P>
+</DD>
+<DT><B><A NAME="Nonterminal">Nonterminal Token</A></B></DT>
+<DD> A "nonterminal token" is a <A HREF="#Token">token</A> which results from reducing the
+sequence of tokens which match a <A HREF="#GrammarRule">grammar rule</A> and replacing them with the
+appropriate <A HREF="#ReductionToken">reduction token</A>. Nonterminal tokens are to be distinguished from
+<A HREF="#Terminal">terminal tokens</A> or input tokens.
+<P></P>
+</DD>
+<DT><B><A NAME="NullProduction">Null Production</A></B></DT>
+<DD> A "null production" is one that has no <A HREF="#Token">tokens</A> on the right side
+whatsoever. Null <A HREF="#Production">production</A>s essentially are identified by the first following
+input token. Null productions are extremely convenient syntactic elements when
+you wish to make some input optional. For example, suppose that you wish to
+allow an optional semicolon at some point in your <A HREF = "#Grammar">grammar</A>. You could write the
+following pair of productions:
+<PRE>
+optional semicolon -&gt; | ';'
+</PRE> Note that a null production can never follow a '|'.  The above could also
+be written on multiple lines thus:
+<PRE>
+optional semicolon
+-&gt;
+-&gt; ';'
+</PRE>  You can always rewrite your grammar to eliminate null productions if you
+wish, but you usually pay a price in conciseness and clarity. Sometimes,
+however, it is necessary to do such a rewrite in order to avoid <A HREF="#Conflict">conflicts</A>, to
+which null productions are especially prone.
+<P>
+If you have a null production with no <A HREF="#ReductionProcedure">reduction procedure</A> specified, your <A HREF="#Parser">parser</A>
+will automatically assign the value zero to the <A HREF="#ReductionToken">reduction token</A>.
+</P><P>
+Null productions can be generated by <A HREF="#VirtualProduction">virtual productions</A>.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="ParameterAssignment">Parameter Assignment</A></B></DT>
+<DD>In any <A HREF="#GrammarRule"> grammar rule</A>, the
+<A HREF="#SemanticValue">semantic value</A> of any
+<A HREF="#RuleElement">rule element</A> may be passed to a
+<A HREF="#ReductionProcedure">reduction procedure</A>
+by means of a parameter assignment. In AnaGram this is particularly
+convenient since you can use an ordinary variable name for the
+parameter. Simply follow the
+rule element with a colon and a C variable name. The C
+variable name can then be used in the reduction
+procedure to reference the semantic value of the token
+it is attached to. AnaGram will automatically provide
+necessary declarations.
+Here are some examples of rule elements with parameter
+assignments:
+<PRE>
+'0-9':d
+integer:n
+expression:x
+declaration : declaration_descriptor
+</PRE>
+</DD>
+<DT><B><A NAME="Parser">Parser</A></B></DT>
+<DD> A parser is a program, or more likely a procedure within a program, which
+scans a sequence of input characters or input <A HREF="#Token">tokens</A> and accumulates them in an
+input buffer or stack as determined by a set of <A HREF="#Production">productions</A> which constitute a
+<A HREF = "#Grammar">grammar</A>. When the parser discovers a sequence of tokens as defined by a <A HREF="#GrammarRule">grammar
+rule</A>, or right side of a production, it "reduces" the sequence to a
+single <A HREF="#ReductionToken">reduction token</A> as defined by the left side of the <A HREF="#GrammarRule">grammar rule</A>. This
+<A HREF="#Nonterminal">nonterminal token</A> now replaces the tokens which matched the grammar rule and the
+search for matches continues. If an input token is encountered which will not
+yield a match for any rule, it is considered a syntax error and some kind of
+error recovery may be required to continue. If a match, or reduction action,
+yields the grammar token, sometimes called the goal token or start token, the
+parser deems its work complete and returns to whatever procedure may have called
+it.
+<P>
+Tokens may have <A HREF="#SemanticValue">semantic values</A>. If the
+input values configuration switch is on, your parser will expect
+semantic values to be provided by the input process along with the
+token identification code. If the input values switch is off, your
+parser will take the ascii value of the input character, that is, the
+actual input code, as the value of the character. When the parser <A
+HREF="#ReduceAction">reduces</A> a <A
+HREF="#Production">production</A>, it can call a <A
+HREF="#ReductionProcedure">reduction procedure</A> or <A
+HREF="#SemanticAction">semantic action</A> to analyze the values of
+the constituent tokens. This reduction procedure can then return a
+value which characterizes the reduced token.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="ParserGenerator">Parser Generator</A></B></DT>
+<DD>
+A parser generator, such as AnaGram, is a program that
+converts a <A HREF="#Grammar">grammar</A>, a rule-based description of the
+input to a program, into a conventional, procedural
+module called a <A HREF="#Parser">parser</A>. The parsers AnaGram generates
+are simple C modules which can be compiled on almost
+any platform. AnaGram parsers are also compatible with
+C++.
+<P></P>
+</DD>
+<DT><B><A NAME="ParserStateStack">Parser State Stack</A></B></DT>
+<DD>
+The parser state stack is a stack maintained by an AnaGram
+parser and which is an integral part of the parsing
+process. At any point in the parse of your input
+stream, the parser state stack provides a list of the
+states which were traversed in order to arrive at the
+current state.
+<P></P>
+</DD>
+<DT><B><A NAME="ParserValueStack">Parser Value Stack</A></B></DT>
+<DD>In parallel with the <A HREF="#ParserStateStack">parser state stack</A>,
+an AnaGram <A HREF="#Parser"> parser</A>
+maintains a <I>value stack</I> each entry of
+which corresponds to the <A HREF="#SemanticValue">semantic value</A> of the token
+identified at that state. Since the semantic values of
+different tokens might well have different data types,
+AnaGram gives you the opportunity, in your syntax
+file, to define the data type for any token. AnaGram
+then builds a typedef statement creating a data type
+which is a union of the all the types you have defined.
+AnaGram creates the name for this data type by
+appending "_vs_type" to the name of the parser. AnaGram uses
+this data type to define the value stack.
+<P></P>
+</DD>
+<DT><B><A NAME="ParsingEngine">Parsing Engine</A></B></DT>
+<DD> A <A HREF="#Parser">parser</A> consists of three basic components: A set of syntax tables, a set
+of <A HREF="#ReductionProcedure">reduction procedures</A> and a parsing engine. The parsing engine is the body of
+code that interprets the parsing table, invokes input functions, and calls the
+reduction procedures. The Build Parser command configures a parsing engine
+according to the implicit requirements of the syntax specification and according
+to the explicit values of the configuration parameters.
+<P>
+The parsing engine itself is a simple automaton, characterized by a set of
+states and a set of inputs. The inputs are the <A HREF="#Token">tokens</A> of your <A HREF = "#Grammar">grammar</A>. Each
+state is represented by a list of tokens which are admissible in that state and
+for each token an <A HREF="#ParserAction">action</A> to perform and a parameter which further defines the
+action. In a traditional <A HREF="#LALRParsing">LALR parser</A>, there are only four <A HREF="#ParserAction">actions</A>: the <A HREF="#ShiftAction">shift
+action</A>, the <A HREF="#ReduceAction">reduce action</A>, the <A HREF="#AcceptAction">accept action</A> and the <A HREF="#ErrorAction">error action</A>. AnaGram, in
+doing its <A HREF="#GrammarAnalysis">grammar analysis</A>, identifies a number of special cases, and creates a
+number of extra actions which make for faster processing, but which can be
+represented as combinations of these primitive actions.
+</P><P>
+When a shift action is performed, the current state number is pushed onto the
+parser state stack and the new state number is determined by the current state
+number and the current lookahead token. Different tokens cause different new
+states.
+</P><P>
+When a reduce action is performed, the length of the rule being reduced is
+subtracted from the parser stack index and the new state number is read from the
+top of the parser state stack. The <A HREF="#ReductionToken">reduction token</A> for the rule being reduced is
+then used as an input token.
+</P><P>
+Each state in the grammar, with the exception of state zero, has a
+<A HREF="#CharacteristicToken">characteristic token</A> which must have been recognized in order to jump to that
+state. Therefore, the parser state stack, which is essentially a list of state
+numbers, can also be thought of as a list of token numbers. This is the list of
+tokens that have been seen so far in the parse of your input stream. Some of
+these tokens, of course, may be <A HREF="#Nonterminal">nonterminal tokens</A> which have resulted from
+reducing other sequences of tokens.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="SetPartition">Partition</A></B></DT>
+<DD> If you use <A HREF="#CharacterSets">character sets</A> in your <A HREF = "#Grammar">grammar</A>, AnaGram will compute a "partition"
+of the <A HREF="#Universe">character universe</A>. This partition is a collection of non-overlapping
+character sets such that every one of the sets you have defined can be written
+as a <A HREF="#SetUnion">union</A> of partition sets. Each partition set is identified by a unique
+reference number called the partition set number.
+<P>
+Each partition set is assigned a unique <A HREF="#Token">token</A>. If one of your character sets
+requires more than one partition set to represent it, AnaGram will create
+appropriate <A HREF="#Production">productions</A> and add them to your grammar so your <A HREF="#Parser">parser</A> can make the
+necessary distinctions.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="Precedence">Precedence</A></B></DT>
+<DD> "Precedence" is an attribute of <A HREF="#BinaryOperator">binary operators</A> and
+<A HREF = "#UnaryOperator">unary
+operators</A> which can be used to resolve <A HREF="#Conflict">conflicts</A>. Operator precedence is also
+referred to as <A HREF = "#Binding">binding</A>. Suppose that # and @ are two binary operators. The
+question is how to interpret x # y @ z. If # has higher precedence than @, it is
+interpreted as (x#y)@z. On the other hand if # has lower precedence than @, it
+would be x#(y@z). If # and @ have the same precedence then the question must be
+resolved by <A HREF="#Associativity">associativity</A>.
+<P>
+Note that all operators at the same precedence level must have the same
+associativity.
+</P><P>
+The situation is somewhat simpler for unary operators. If # and @ were both
+prefix operators, or both suffix operators, there would be no ambiguity in
+interpretation, since neither # @ x and x # @ offer more than one possible
+interpretation. The only difficulty arises when both a prefix and a suffix
+operator are applied to the same operand.
+</P><P>
+Suppose that # is a prefix operator and @ is a suffix operator. The question
+then is how to interpret # x @. If # has higher precedence than @, it would be
+interpreted as (# x) @. On the other hand, if # has lower precedence than @, it
+would be interpreted as # (x @). If # and @ have the same precedence then the
+question must be resolved by associativity.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="Production">Production</A></B></DT>
+<DD> Productions are the mechanism used to describe how complex input
+structures are built up out of simpler ones. Each production has a left side and
+a right side. The right side, or <A HREF="#GrammarRule">grammar rule</A>,
+is a sequence of <A HREF="#RuleElement">rule elements</A>,
+which may represent either <A HREF="#Terminal">terminal tokens</A> or
+<A HREF="#Nonterminal">nonterminal tokens</A>. The left side
+is a list of <A HREF="#ReductionToken">reduction tokens</A>.
+In most cases there would be only a single
+reduction token. Productions with more than one <A HREF="#Token">token</A>
+on the left side are
+called <A HREF = "#SemanticallyDetermined">semantically determined productions</A>.
+<p>The
+"<CODE>-&gt;</CODE>" symbol is used
+to separate the left side from the right side.
+If you have several
+productions with the same left hand side, you can avoid
+rewriting the left hand side either by using '|' or by
+using another "->".
+<p>
+A <A HREF="#NullProduction"> null production</A>, or empty right hand side, cannot
+follow a '|'.
+<P>
+Productions may be written thus:
+<pre>
+name
+-> letter
+-> name, digit
+</pre>
+This could also be written
+<pre>
+name -> letter | name, digit
+</pre>
+In order to accommodate semantic analysis of the data,
+you may attach to any grammar rule a <A HREF="#ReductionProcedure">reduction
+procedure</A> which will be executed when the rule is
+identified. Each token may have a <A HREF="#SemanticValue"> semantic value</A>. By
+using <A HREF="#ParameterAssignment"> parameter assignments</A>, you may provide the
+reduction procedure with access to the semantic values of
+tokens that comprise the grammar rule. When it finishes, the
+reduction procedure may return a value which will be
+saved on the <A HREF="#ParserValueStack"> parser value stack</A> as the semantic value of the
+<A HREF="#ReductionToken">reduction token</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="ReduceAction">Reduce Action</A></B></DT>
+<DD> The reduce action is one of the four <A HREF="#ParserAction">actions</A> of a traditional parsing
+engine. The reduce action is performed when the <A HREF="#Parser">parser</A> has succeeded in matching
+all the elements of a <A HREF="#GrammarRule">grammar rule</A> and the next input <A HREF="#Token">token</A> is not erroneous.
+Reducing the grammar rule amounts to subtracting the length of the rule from the
+parser stack index, identifying the <A HREF="#ReductionToken">reduction token</A>, stacking its
+<A HREF="#SemanticValue">semantic value</A>
+and then doing a <A HREF="#ShiftAction">shift action</A> with the reduction token as though it had been
+input directly.
+<P></P>
+</DD>
+<DT><B><A NAME="ReduceReduce">Reduce-Reduce Conflict</A></B></DT>
+<DD> A <A HREF = "#Grammar">grammar</A> has a "reduce-reduce" conflict at some state if a
+single <A HREF="#Token">token</A> turns out to be a <A HREF="#ReducingToken">reducing token</A> for more than one <A HREF="#CompletedRule">completed rule</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="ReducingToken">Reducing Token</A></B></DT>
+<DD> In a state with more than one <A HREF="#CompletedRule">completed rule</A>, your <A HREF="#Parser">parser</A> must be able to
+determine which one was actually found. AnaGram deals with this problem by
+looking at all the states the parser will branch to once each rule is reduced.
+The acceptable input <A HREF="#Token">tokens</A> for those states are the "reducing tokens"
+for the completed rules in the state under investigation. If there is a single
+token which is a reducing token for more than one rule, then the <A HREF = "#Grammar">grammar</A> is said
+to have a <A HREF="#ReduceReduce">reduce-reduce conflict</A> at that state. If in a particular state there
+is both a <A HREF="#ShiftAction">shift action</A> and a <A HREF="#ReduceAction">reduce action</A> for the same token the grammar is
+said to have a <A HREF="#ShiftReduce">shift-reduce conflict</A> in that state.
+<P>
+A reducing token is not the same as a <A HREF="#ReductionToken">reduction token</A>.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="ReductionProcedure">Reduction Procedure</A></B></DT>
+<DD> A "reduction procedure" is a function you write which your
+<A HREF="#Parser">parser</A> executes when it has identified the <A HREF="#GrammarRule">grammar rule</A> to which the reduction
+procedure is attached in your <A HREF = "#Grammar">grammar</A>. There are two formats for reduction
+procedures, depending on the size and complexity of the procedure.
+<P></P>
+</DD>
+<DT><B><A NAME="ReductionToken">Reduction Token</A></B></DT>
+<DD> A <A HREF="#Token">token</A> which appears on the left side of a <A HREF="#Production">production</A> is called a
+reduction token. It is so called because when the <A HREF="#GrammarRule">grammar rule</A> on the right side
+of the production is matched in the input stream, your <A HREF="#Parser">parser</A> will <A HREF="#ReduceAction">"reduce"</A>
+the sequence of tokens which matches the rule by replacing the sequence of
+tokens with the reduction token. Note that, if more than one reduction token is
+specified, your <A HREF="#ReductionProcedure">reduction procedure</A> should choose the exact one. If it does not,
+your parser will use the leftmost syntactically correct one in the list as the default.
+<P>
+A reduction token is not the same as a <A HREF="#ReducingToken">reducing token</A>.
+</P>
+<P></P>
+</DD>
+<DT><B><A NAME="Resynchronization">Resynchronization</A></B></DT>
+<DD> "Resynchronization" is the process of getting your <A HREF="#Parser">parser</A> back
+in step with its input after encountering a syntax error. As such, it is one
+method of error recovery. Of course, you would resynchronize only if it
+is necessary to continue after the error. There are several options available
+when using AnaGram. You could use the auto resynch option, which causes AnaGram
+to incorporate an automatic resynchronizing procedure into your parser, or you
+could use the error resynch option, which is similar to the technique used by
+YACC programmers.
+</DD>
+<P></P>
+<DT><B><A NAME="RuleElement">Rule Element</A></B></DT>
+<DD> A <A HREF="#GrammarRule">grammar rule</A> is a list of "rule elements", separated by
+commas. Rule elements may be <A HREF="#Token">token</A> names, <A HREF="#CharacterSets">character sets</A>, keywords, immediate
+actions, or <A HREF="#VirtualProduction">virtual productions</A>. When AnaGram encounters a rule element for
+which no token presently exists, it creates one.
+<P></P>
+</DD>
+<DT><B><A NAME="SemanticAction">Semantic Action</A></B></DT>
+<DD>
+A semantic action is a piece of C or C++ code that is executed when
+a <A HREF="#GrammarRule">grammar rule</A> has been identified by a
+<A HREF="#Parser">parser</A>. Semantic actions are also called
+<A HREF="#ReductionProcedure">reduction procedures</A>, since they
+are executed when a grammar rule is <A
+HREF="#ReduceAction">reduced</A> to the token on the left side of the
+<A HREF="#Production">production.</A>
+<P></P>
+</DD>
+<DT><B><A NAME="SemanticValue">Semantic Value</A></B></DT>
+<DD>
+<A HREF="#Token">Tokens</A>, whether <A HREF="#Terminal">terminal</A>
+or <A HREF="#Nonterminal">nonterminal</A>, may have a semantic value.
+In the case of terminal tokens, this may be a value assigned by the
+<A HREF="#LexicalScanner">lexical scanner</A> or, if the parser is
+using direct character input, it will be the ascii value of the
+character itself. The values of nonterminal tokens are created by
+<A HREF="#ReductionProcedure">reduction procedures</A>. As a parse
+progresses, token values are <A HREF="#ShiftAction">shifted</A>
+onto the stack, so that in a reduction procedure, the values of the
+tokens that comprise the <A HREF="#GrammarRule">grammar rule</A> that
+is being <A HREF="#ReduceAction">reduced</A> are available
+for inspection.
+<P></P>
+</DD>
+<DT><B><A NAME="SemanticallyDetermined">Semantically Determined Production</A></B></DT>
+<DD> A "semantically determined production" is one which has more
+than one <A HREF="#ReductionToken">reduction token</A> specified on the left side of the production. You would
+write such a <A HREF="#Production">production</A> when the reduction tokens are syntactically
+indistinguishable. The <A HREF="#ReductionProcedure">reduction procedure</A> may then specify which of the listed
+reduction tokens the <A HREF="#GrammarRule">grammar rule</A> is to reduce to based on semantic
+considerations. If there is no reduction procedure, or the reduction procedure
+does not specify a reduction token, the <A HREF="#Parser">parser</A> will use the leftmost syntactically correct token.
+<P></P>
+</DD>
+<DT><B><A NAME="SetExpression">Set Expression</A></B></DT>
+<DD> A set expression is an algebraic expression used to define a <A HREF="#CharacterSets">character set</A>
+in terms of individual characters, ranges of characters, and/or other sets of
+characters as constructed using <A HREF="#SetComplement">complements</A>, <A HREF="#SetUnion">unions</A>, <A HREF="#SetIntersection">intersections</A>, and
+<A HREF="#SetDifference">differences</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="ShiftAction">Shift Action</A></B></DT>
+<DD> The shift action is one of the four <A HREF="#ParserAction">actions</A> of a traditional parsing
+engine. The shift action is performed when the input <A HREF="#Token">token</A> matches one of the
+acceptable input tokens for the current parser state. The
+<A HREF="#SemanticValue">semantic value</A> of the
+token and the current state number are stacked, the parser stack index is
+incremented and the state number is set to a value determined by the previous
+state and the input token.
+<P></P>
+</DD>
+<DT><B><A NAME="ShiftReduce">Shift-Reduce Conflict</A></B></DT>
+<DD> A "shift-reduce" conflict occurs if in some state there is a
+single <A HREF="#Token">token</A> that should be shifted, because it is legitimate input for one of
+the rules of the state, but should also be used to reduce some other rule
+because it is a <A HREF="#ReducingToken">reducing token</A> for that rule.
+<P></P>
+</DD>
+<DT><B><A NAME="SyntaxDirectedParsing">Syntax Directed Parsing</A></B></DT>
+<DD>
+<P>Syntax directed parsing, or formal parsing, is an
+approach to building <A HREF="#Parser">parsers</A> based on formal language
+theory. Given a suitable description of a language,
+called a <A HREF="#Grammar">grammar</A>, there are algorithms which can be
+used to create parsers for the language automatically.
+In this context, the set of all possible inputs to a
+program may be considered to constitute a language, and
+the rules for formulating the input to the program
+constitute the grammar for the language.
+<P>The parsers built from a grammar have the advantage
+that they can recognize any input that conforms to the
+rules, and can reject as erroneous any input that fails
+to conform.
+<P>Since the program logic necessary to parse input is
+often extremely intricate, programs which use formal
+parsing are usually much more reliable than those built
+by hand. They are also much easier to maintain, since
+it is much easier to modify a grammar specification
+than it is to modify complex program logic.
+<P></P>
+</DD>
+<DT><B><A NAME="Terminal">Terminal Token</A></B></DT>
+<DD> A "terminal token" is a <A HREF="#Token">token</A> which does not appear on the left
+side of a <A HREF="#Production">production</A>. It represents, therefore, a basic unit of input to your
+<A HREF="#Parser">parser</A>. If the input to your parser consists of ascii characters, you may define
+terminal tokens explicitly as ascii characters or as <A HREF="#CharacterSets">sets of ascii characters</A>.
+If you have an input procedure which produces numeric codes, you may define the
+terminal tokens directly in terms of these numeric codes.
+<P></P>
+</DD>
+<DT><B><A NAME="Token">Token</A></B></DT>
+<DD> Tokens are the units with which your <A HREF="#Parser">parser</A> works. There are two kinds of
+tokens: <A HREF="#Terminal">terminal tokens</A> and <A HREF="#Nonterminal">nonterminal tokens</A>. These latter are identified by
+the parser as sequences of tokens. The grouping of tokens into more complex
+tokens is governed by the <A HREF="#GrammarRule">grammar rules</A>, or <A HREF="#Production">productions</A> in your <A HREF = "#Grammar">grammar</A>. In your
+grammar, tokens may be denoted by token names, by <A HREF="#VirtualProduction">virtual productions</A>, by
+immediate actions, by explicit character representations, or by expressions
+which yield <A HREF="#CharacterSets">character sets</A>.
+<P>
+A "token" should be thought of more or less as being something like a
+cloakroom token. Imagine you had taken a kindergarten class to the museum and
+all the kids checked their coats. Each one gets a token. What is the probability
+of losing one? You take all of the tokens from the children, put them in a paper
+bag, tie it up, and check it. You get one token in return. We would say that the
+kids' coats represent the actual input to the system, their individual tokens
+are the <A HREF="#Terminal">terminal tokens</A>, and your one token which enables you to redeem the
+paper bag is a <A HREF="#Nonterminal">nonterminal token</A>. Nonterminal tokens are just single token
+numbers to represent the result of compacting a sequence of more primitive
+tokens.
+</P><P>
+Actually, the word "token" is commonly used to refer to a number of
+apparently different things. The tokens you use in a grammar are abstract. The
+tokens identified by a parser are concrete instances of the abstract tokens in
+the grammar. Furthermore, we have to identify tokens in some manner, so we have
+token names and token numbers with which to refer to particular tokens. As a
+result the word "token", in any particular context, may refer directly
+to either an abstract token, or a concrete instance of an abstract token, or may
+refer only indirectly to the token by means of a token name or token number.
+</P><P>
+As an example, consider a C compiler which has a lexical scanner to identify
+input tokens. According to Kernighan and Ritchie, there are six classes of C
+tokens: identifiers, keywords, constants, string literals, operators, and other
+separators. Consider a particular token, a hexadecimal constant, 0XF00. When the
+lexical scanner hands this over to the parser, it has to provide an
+identification code which says what kind of token it is, a hexadecimal constant,
+as well as the
+<A HREF="#SemanticValue">"semantic value"</A> of the token,
+3840.  The identification code is usually an integer value,
+determined by the designer of the lexical scanner, which by agreement
+with the designer of the parser uniquely represents a hexadecimal
+constant.  This identification code can be thought of as an external
+token number.
+</P><P>
+The grammar for the parser, on the other hand, has a token, "HEXconstant",
+let us say, to represent the abstract usage of hexadecimal constants in the C
+language. When AnaGram, or any other parser generator for that matter, analyzes
+the grammar, it assigns its own reference number to identify "HEXconstant".
+This reference number can be thought of as an internal token number to contrast
+it with the external token number provided by the lexical scanner. The parsers
+AnaGram builds contain a table, ag_ tcv, which provides a translation from the
+external to the internal token number. Both of these token numbers, of course,
+differ from the semantic value of the token, in this case, 3840.
+</P><P>
+Even when an AnaGram parser is using simple character input, the word "token"
+may represent several different things. Suppose a grammar has a token 'A-Z',
+which represents the set of upper case letters. There will be an internal token
+number, assigned by AnaGram, which identifies the (abstract) token in parser
+tables. The actual (concrete) input token to a parser will, however, be a single
+letter, say "Q". In this case, the ascii value of the character "Q",
+81, is used both as the external token number and as the semantic value of the
+token.
+</P><P></P>
+</DD>
+<DT><B><A NAME="UnaryOperator">Unary Operator</A></B></DT>
+<DD>A "unary operator" is a <A HREF="#Token">token</A> which represents
+some sort of operation which operates on a single entity to produce a new
+resultant entity. It is normally represented by a symbol which is written either
+preceding the operand or following the operand. In the first case it is called a
+"prefix operator" and in the second case a "suffix operator".
+</P><P></P>
+</DD>
+<DT><B><A NAME="SetUnion">Union</A></B></DT>
+<DD>
+The union of two sets is the set of all elements that are to be found in one or
+another of the two sets. In an AnaGram syntax file the union of two <A HREF="#CharacterSets">character
+sets</A> A and B is represented using the plus sign, as in A + B. The union operator
+has the same <A HREF="#Precedence">precedence</A> as the <A HREF="#SetDifference">difference</A> operator: lower than that of
+<A HREF="#SetIntersection">intersection</A> and <A HREF="#SetComplement">complement</A>. The union operator is <A HREF="#Associativity">left associative</A>.
+<P></P>
+</DD>
+<DT><B><A NAME="VirtualProduction">Virtual Production</A></B></DT>
+<DD>
+<P>Virtual productions are a special short hand
+representation of <A HREF="#GrammarRule">grammar rules</A> which can be used to
+indicate a choice of inputs. They are an important
+convenience, especially useful when you are first
+building a <A HREF="#Grammar">grammar</A>.
+<P>AnaGram rewrites virtual productions, so that when you
+look at the syntax tables in AnaGram, there will be
+actual <A HREF="#Production">productions</A> replacing the virtual productions.
+<P>A virtual production appears as one of the <A HREF="#RuleElement">rule
+elements</A> in a <A HREF="#GrammarRule">grammar rule</A>, i.e. as one of the members
+of the list on the right side of a production.
+<P>The simplest virtual production is the "optional"
+<A HREF="#Token">token</A>. If x is an arbitrary token, x? can be used to
+indicate an optional x.
+<P>Related virtual productions are x... and x?...  where
+the three dots indicate repetition. x... represents an
+arbitrary number of occurrences of x, but at least one.
+x?... represents zero or more occurrences of x.
+<P>The remaining virtual productions use curly or square
+brackets to enclose a sequence of rules. The brackets
+may be followed variously by nothing, a string of three
+dots, or a slash, to indicate the choices to be made
+from the rules. Note that rules may be used, not merely
+tokens.
+<P>If r1 through rn are a set of grammar rules, then
+<PRE>   {r1 | r2 | ... | rn}  </PRE>
+is a virtual production that allows a choice of exactly
+one of the rules. Similarly,
+<PRE>   {r1 | r2 | ... | rn}...  </PRE>
+is a virtual production that allows a choice of one or
+more of the rules. And, finally,
+<PRE>   {r1 | r2 | ... | rn}/...  </PRE>
+is a virtual production that allows a choice of one or
+more of the rules subject to the side condition that
+rules must alternate, that is, that no rule can follow
+itself immediately without the interposition of some
+other rule. This is a case that is not particularly
+easy to write by hand, but is quite useful in a number
+of contexts.
+<P>If the above virtual productions are written with [ ]
+instead of { }, they all become optional. [ ] is an
+optional choice, [ ]... is zero or more choices, and
+[ ]/... is zero or more alternating choices.
+<P><A HREF="#NullProduction">Null productions</A> are not permitted in virtual
+productions in those cases where they would cause an
+intrinsic ambiguity.
+<P>You may use a definition statement to assign a name to
+a virtual production.
+</DD>
+</DL>
+<P>
+<BR>
+<IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------"
+WIDTH=1010 HEIGHT=2 >
+<P>
+<IMG ALIGN="right" SRC="images/pslrb6d.gif" ALT="Parsifal Software"
+WIDTH=181 HEIGHT=25>
+<BR CLEAR="right">
+Back to <A HREF="index.html">Index</A>
+<P>
+<ADDRESS><FONT SIZE="-1">
+AnaGram parser generator - documentation<BR>
+Glossary<BR>
+Copyright &copy; 1993-1999, Parsifal Software. <BR>
+All Rights Reserved.<BR>
+</FONT></ADDRESS>
+</BODY>
+</HTML>

Mercurial > ~dholland > hg > ag > index.cgi

comparison doc/misc/html/gloss.html @ 0:13d2b8934445