view doc/misc/html/summary.html @ 16:f9e4689b837d

Some minor updates for 15 years later.
author David A. Holland
date Tue, 31 May 2022 01:45:26 -0400
parents 13d2b8934445
children
line wrap: on
line source

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2//EN">
<HTML>
<HEAD>
<TITLE>Summary of AnaGram Notation</TITLE>
</HEAD>

<BODY BGCOLOR="#ffffff" BACKGROUND="tilbl6h.gif"
 TEXT="#000000" LINK="#0033CC"
 VLINK="#CC0033" ALINK="#CC0099">

<P>

<IMG ALIGN="right" SRC="images/agrsl6c.gif" ALT="AnaGram"
         WIDTH=124 HEIGHT=30>
<BR CLEAR="all">
Back to <A HREF="index.html">Index</A>
<P>
<IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------"
        WIDTH=1010 HEIGHT=2  >

<BR CLEAR="all">


<H1 ALIGN="LEFT">Summary of AnaGram Notation</H1>
<IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------"
      WIDTH=1010 HEIGHT=2 >
</P>
<BR>

The rules for using AnaGram are given in Chapters 8 and 9 of
the AnaGram User's Guide. This page contains a brief summary.
Section headings and terms which appear in <b> bold face</b> can be found
in the online <b> Help Topics</b>.
<P>
<BR>

<H2>Lexical Conventions</H2>
AnaGram allows the free use of spaces, tabs and <b> comments</b>.
Both C style and C++ style comments are allowed. Blank lines are
allowed, but only <i>between</i> statements.
<p>
AnaGram statements may continue onto following lines as long
as they are clearly incomplete. Normally this rule is satisfied by
dangling punctuation or open parentheses, brackets, or braces. In no
case can a statement continue over a blank line.
<P>
<BR>

<H2>Names</H2>
Symbol names must begin with a letter or underscore, and may
contain letters, digits, or underscores. They may also contain embedded spaces, tabs, and comments. Any sequence of embedded
space, however, is replaced by a single blank character.
<p>
The names <b> eof</b>, <b> error</b>, and <b> grammar</b> have special meanings.
<P>
<BR>

<H2>Character Representations</H2>
You may represent a character using the same rules as for
<b> character constants</b> in C. You may also use signed integers, using
either decimal, octal or hexadecimal formats, again following the
rules for C. You may specify control characters using ^, e.g., ^C.
<P>
<BR>

<H2>Character Ranges</H2>
Character ranges may be specified either in the form 'a-z' or
with two simple characters separated by "..", e.g, 32..255.
<P>
<BR>

<H2>Character Sets</H2>
Use the following operators for more complex character sets:
<DL>
  <DT>Set <b> union</b></DT><DD>A + B</DD>
  <DT>Set <b> difference</b></DT><DD>A - B</DD>
  <DT>Set <b> intersection</b></DT><DD>A &amp; B</DD>
  <DT>Set <b> complement</b></DT><DD>~A</DD>
</DL>

AnaGram interprets a single character to mean the set containing
only the character itself.
<P>
<BR>

<H2>Keywords</H2>
A character string enclosed in double quotes is a <B>keyword</B>. The
rules for writing keyword strings are the same as for literal strings
in C. AnaGram parsers have special lookahead logic to recognize
keywords, so that keywords get special treatment. They are <i>not</i>
equivalent to the corresponding sequence of single characters.
<P>
<BR>

<H2>Tokens</H2> The units of a grammar are called <A
HREF="gloss.html#Token">tokens</A>.  <A HREF="gloss.html#Terminal">
Terminal tokens </A>may be <b> character sets</b>, <b> keywords</b>,
<b> immediate actions</b>, or <b> virtual productions</b>. <A
HREF="gloss.html#Nonterminal"> Nonterminal tokens</A> are defined
in terms of other tokens by means of productions.  <P> <BR>

<H2>Productions</H2>
A <A HREF="gloss.html#Production">production</A> consists of one or more
token names on the left, an arrow ( <CODE>-&gt;</CODE> ), and a
<A HREF="gloss.html#GrammarRule">
grammar rule</A> on the right. A production with more than one name on
the left is called a <A HREF="gloss.html#SemanticallyDetermined">
semantically determined production</A>.  Additional productions with
the same left side may be joined by using | or another arrow. The
arrow, if used, must start a new line.

<P> If the token on the left
side of a production is called <i>grammar</i> or is tagged with a
following dollar sign, it is taken to be the <A
HREF="gloss.html#GrammarToken"> grammar token</A>, or goal token for
the grammar.

<P> The names on the left side of a
production may be preceded by a type cast indicating the data type of
the <A HREF="gloss.html#SemanticValue"> semantic value</A> of the
named tokens.

<P> A grammar rule is a sequence of <A HREF="gloss.html#RuleElement">
rule elements</A> joined by commas.  The rule elements may be <b>
character sets</b>, <b> keywords</b>, <b> token names</b>, <b> virtual
productions</b>, or <b> immediate actions</b>.

<P> A <A HREF="gloss.html#VirtualProduction">virtual
production</A> is a token name or character set expression followed by
?  or ?..., or a sequence of one or more rules, joined by vertical bars
( | ) , inside brackets or braces and optionally followed by an
ellipsis (...). The ? indicates an optional token. Braces indicate a
choice among the listed rules.  Brackets indicate an optional choice.
The ellipsis represents unlimited repetition.  <P> <BR>

<H2>Reduction Procedures</H2>
A <A HREF="gloss.html#ReductionProcedure">reduction procedure</A> is a
piece of C or C++ code following a grammar rule that is to be executed
when the rule is recognized in the parser's input stream. Reduction
procedures may be short form:  a single expression followed by a
semicolon, or long form:  a block of code enclosed in braces. In either
case they are preceded by an equal sign. Short form procedures may not
continue onto another line.

<P> Reduction procedures may access the
<A HREF="gloss.html#SemanticValue"> semantic values</A> of tokens in
the grammar rule to which they are attached. To each token whose value
is needed append a colon and the variable name used for the token value
in the reduction procedure. In a short form reduction procedure, the
value of the expression is assigned to the <A
HREF="gloss.html#ReductionToken"> reduction token</A>, the token on
the left side of the production. In a long form procedure, use the
return statement to assign a value to the token on the left side of the
production.  <p> An <b> immediate action</b> differs from a reduction
procedure in that it may occur in the middle of a grammar rule. To
distinguish it from a reduction procedure, it begins with an
exclamation point rather than an equal sign.  <P> <BR>

<H2>Definitions</H2>
You may assign names to frequently used character sets, virtual
productions, keywords, or immediate actions by using a definition
statement consisting of a name, an equal sign and the entity to be
named.
<P>
<BR>

<H2>Configuration Section</H2>
A configuration section is a block of special statements enclosed
in brackets. These are either <b> attribute statements</b> or assign values
to <b> configuration parameters</b> or switches, all of which are
described in on-line help windows.
<P>
<BR>

<H2>Embedded C</H2>
You may include C or C++ code to support your reduction
procedures at any point in your grammar by enclosing it in braces.
The beginning brace must be on a fresh line, and no other statement
may follow on the same line as the terminating brace. A
block of embedded C at the very beginning of a <b> syntax file</b> is
called the <b> C prologue</b>.

<P>
<BR>

<IMG ALIGN="bottom" SRC="images/rbline6j.gif" ALT="----------------------"
      WIDTH=1010 HEIGHT=2 >
<P>
<IMG ALIGN="right" SRC="images/pslrb6d.gif" ALT="Parsifal Software"
                WIDTH=181 HEIGHT=25>
<BR CLEAR="right">
<P>
Back to  <A HREF="index.html">Index</A>
<P>
<ADDRESS><FONT SIZE="-1">
                  AnaGram parser generator - documentation<BR>
                  Summary of AnaGram Notation<BR>
                  Copyright &copy; 1993-1999, Parsifal Software. <BR>
                  All Rights Reserved.<BR>
</FONT></ADDRESS>

</BODY>
</HTML>